Wednesday, April 10, 2013 - 12:10pm - 1:00pm
Pardee 217
Jeffrey Woo
Abstract Increasing volumes of data are archived by government agencies, health networks, search engines, social networking websites, and other organizations. The potential for scientific discovery and social benefits of analyzing these databases are significant. At the same time, releasing information from such repositories can cause devastating damage to the privacy of individuals or organizations whose information is stored there. The challenge, in particular for statistical agencies, is how to provide high-quality data products without compromising the privacy of the individuals whose data they contain. The field of Statistical Disclosure Control (SDC) aims at developing methodology that balances the objectives of providing data for valid statistical inference and safeguarding confidential information. In the first part of the talk, I will give a general overview of the data privacy problem and some of the SDC methodologies. In the second part of the talk, I will present my work on the Post Randomization Method (PRAM). PRAM is a disclosure control method, where values of categorical variables are perturbed via some known probability mechanism, and only the perturbed data are released thus raising issues regarding disclosure risk and data utility. To address these issues, and in particular that of data utility, I propose an EM-type algorithm to obtain unbiased estimates of generalized linear models after accounting for the effect of PRAM.
Sponsored by: 
Mathematics Department, MAAD

Contact information

c. jayne trent