Description

Statistical agencies publish data which contains items that need to be
altered to protect confidentiality. Controlled Tabular Adjustments (CTA)
is a recent method to limit disclosure and can be elegantly expressed
as a Mixed Integer Programming problem. The programming framework then
allows easy expression of other data relationships like multi-dimensional
adding up conditions. The following model uses a 3-dimensional table from
from Cox, Kelly and Patil (2005) to illustrate this method.

The data is stored in an Excel Spreadsheet.

Large Model of Type : MIP

Category : GAMS Model library

Main file : cta.gms   includes :  cox3.xlsx

\$onText
Statistical agencies publish data which contains items that need to be
altered to protect confidentiality. Controlled Tabular Adjustments (CTA)
is a recent method to limit disclosure and can be elegantly expressed
as a Mixed Integer Programming problem. The programming framework then
allows easy expression of other data relationships like multi-dimensional
adding up conditions. The following model uses a 3-dimensional table from
from Cox, Kelly and Patil (2005) to illustrate this method.

The data is stored in an Excel Spreadsheet.

Lawrence H Cox, James P Kelly and Rahul J Patil, Computational Aspects
of Controlled Tabular Adjustments: Algorithms and Analysis, in The Next
Wave in Computing, Optimization, and Decision Technologies, Eds Bruce L Golden,
S Raghavan and Edward A Wasil, Springer, 2005, pp 45-59.

Keywords: mixed integer linear programming, statistical disclosure limitations
\$offText

* extract data from Excel
\$if %system.filesys% == UNIX \$terminate
\$libInclude win32 -c msappavail Excel
\$ifE errorLevel<>0 \$goto noExcel
\$goto Excel
\$label noExcel
\$log Microsoft Excel is not available!
\$exit

\$label Excel
\$onEcho > temp.txt
*extract domain information
dset = i  rng = sheet1!b2 rdim = 1
dset = j  rng = sheet1!c1 cdim = 1
dset = k  rng = sheet1!a2 rdim = 1
*extract data
par = dat rng = sheet1!a1 rdim = 2 cdim = 1
par = pro rng = sheet2!a1 rdim = 2 cdim = 1
\$offEcho
\$call =gdxxrw cox3.xlsx trace=0 @temp.txt
\$ifE errorLevel<>0 \$abort data extraction failed

Set
i 'rows'
j 'columns'
k 'planes';

Parameter
dat(k,i,j) 'unprotected data table'
pro(k,i,j) 'information sensitive cells';

\$gdxIn cox3
\$load  i j k dat pro
\$gdxIn

* do some basic data checks
abort\$sum((i,k), round(sum(j, dat(k,i,j)) - 2*dat(k,i,'total'))) 'row totals are incorrect', dat;
abort\$sum((j,k), round(sum(i, dat(k,i,j)) - 2*dat(k,'total',j))) 'column totals are incorrect', dat;
abort\$sum((i,j), round(sum(k, dat(k,i,j)) - 2*dat('total',i,j))) 'plane totals are incorrect', dat;

Variable
obj;

Binary   Variable b(i,j,k);

Equation
pmin(i,j,k)   'small value for sensitive cells'
pmax(i,j,k)   'big value for sensitive cells'
defobj;

Set
v(i,j,k) 'non zero cells'
s(i,j,k) 'sensitive cells';

Parameter BigM 'the famous big M - make it as small as possible';

pmin(s(i,j,k))..   adjN(s) =g= pro(k,i,j)*(1 - b(s));

Equation pminx, pmaxx;

pminx(s(i,j,k))..  adjN(s) =l= BigM*pro(k,i,j)*(1 - b(s));

Model cox3 / all /;

v(i,j,k) = dat(k,i,j);
s(i,j,k) = pro(k,i,j);

option limCol = 0, limRow = 0, solPrint = off, optCr = 0, optCa = 0.99, resLim = 10;

BigM = 2;

solve cox3 min obj using mip;

Parameter
rep(k,i,j)      'summary report'

rep(k,i,j)          =  t.l(i,j,k);

* now we find the next best 5 solutions
Set
l     'solution labels' / solution1*solution5 /
ll(l) 'dynamic version of l';

Parameter
binrep(*,*,*,l) 'binary for protected variables'
best            'best objective value';

option binrep:0:3:1;

Equation
cutone(l) 'cuts to exclude previous solutions'
cuttwo(l) 'cuts to exclude previous solutions';

* there is always a complementary solution by just changing all the signs
* cut(ll)..  sum(s, abs(b(s) - binrep(s,ll)) =g= 1;

cutone(ll).. sum(s\$binrep(s,ll), 1 - b(s)) + sum(s\$(not binrep(s,ll)), b(s)) =g= 1;

cuttwo(ll).. sum(s\$(not binrep(s,ll)), 1 - b(s)) + sum(s\$binrep(s,ll), b(s)) =g= 1;

Model cox3c 'includes cuts' / all /;

* find the card(l) best solutions that are within 1% of the global
best         = round(obj.l);
cox3c.resUsd = cox3.resUsd;
cox3c.nodUsd = cox3.nodUsd;

loop(l\$((obj.l - best)/best <= 0.01),
ll(l) = yes;
binrep(s,l)             = round(b.l(s));
binrep('','','Obj',l)   = obj.l;
binrep('','','mSec',l)  = cox3c.resUsd*1000;
binrep('','','nodes',l) = cox3c.nodUsd;