Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Journal of Official Statistics

The Journal of Statistics Sweden

4 Issues per year


IMPACT FACTOR 2016: 0.411
5-year IMPACT FACTOR: 0.776

CiteScore 2016: 0.63

SCImago Journal Rank (SJR) 2016: 0.710
Source Normalized Impact per Paper (SNIP) 2016: 0.975

Open Access
Online
ISSN
2001-7367
See all formats and pricing
More options …

A Simple Method for Limiting Disclosure in Continuous Microdata Based on Principal Component Analysis

Aida Calviño
  • Department of Computer Science and Mathematics, Universitat Rovira i Virgili, 43007 Tarragona, Spain Spain
  • Department of Statistics and Operations Research III, Complutense University of Madrid, 28040 Madrid, Spain
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2017-02-21 | DOI: https://doi.org/10.1515/jos-2017-0002

Abstract

In this article we propose a simple and versatile method for limiting disclosure in continuous microdata based on Principal Component Analysis (PCA). Instead of perturbing the original variables, we propose to alter the principal components, as they contain the same information but are uncorrelated, which permits working on each component separately, reducing processing times. The number and weight of the perturbed components determine the level of protection and distortion of the masked data. The method provides preservation of the mean vector and the variance-covariance matrix. Furthermore, depending on the technique chosen to perturb the principal components, the proposed method can provide masked, hybrid or fully synthetic data sets. Some examples of application and comparison with other methods previously proposed in the literature (in terms of disclosure risk and data utility) are also included.

Keywords: Statistical disclosure control; microdata protection; hybrid microdata; masking method; propensity score

References

  • Banu, R. and N. Nagaveni. 2009. “Preservation of Data Privacy Using PCA Based Transformation.” In International Conference on Advances in Recent Technologies in Communication and Computing, 439-443. Doi: http://dx.doi.org/10.1109/ARTCom.2009.159.CrossrefGoogle Scholar

  • Brand, R. 2002. “Microdata Protection through Noise Addition.” In Inference Control in Statistical Databases, edited by J. Domingo-Ferrer. Lecture Notes in Computer Science, 2316: 97-116. Berlin Heidelberg: Springer. Doi: http://dx.doi.org/10.1007/3-540-47804-38.CrossrefGoogle Scholar

  • Brand, R., J. Domingo-Ferrer, and J. Mateo-Sanz. 2002. Reference Data Sets to Test and Compare SDC Methods for Protection of Numerical Microdata. Deliverable of European Project IST-2000-25069 CASC. Available at: http://neon.vb.cbs.nl/casc (accessed August 2016).Google Scholar

  • Burridge, J. 2003. “Information Preserving Statistical Obfuscation.” Statistics and Computing 13: 321-327. Doi: http://dx.doi.org/10.1023/A:1025658621216.CrossrefGoogle Scholar

  • Domingo-Ferrer, J. and U. Gonza´lez-Nicola´s. 2010. “Hybrid Microdata Using Microaggregation.” Information Sciences 180: 2834-2844. Doi: http://dx.doi.org/10.1016/j.ins.2010.04.005.CrossrefGoogle Scholar

  • Domingo-Ferrer, J. and V. Torra. 2001. “A Quantitative Comparison of Disclosure Control Methods for Microdata.” In Confidentiality, disclosure, and data access: Theory and practical applications for statistical agencies, edited by P. Doyle, J. Lane, J. Theeuwes, and L. Zayatz. 111-133. Elsevier. Available at: https://www.iiia.csic.es/es/publications/quantitativecomparison-disclosure-control-methods-microdata (accessed August 2016).Google Scholar

  • Domingo-Ferrer, J. and V. Torra. 2004. “Disclosure Risk Assessment in Statistical Data Protection.” Journal of Computational and Applied Mathematics 164: 285-293. Doi: http://dx.doi.org/10.1016/S0377-0427(03)00643-5.CrossrefGoogle Scholar

  • Drechsler, J. 2011. Synthetic datasets for statistical disclosure control: theory and implementation, volume 201. Springer Science & Business Media.Google Scholar

  • Duncan, G. and R. Pearson. 1991. “Enhancing Access to Microdata While Protecting Confidentiality: Prospects for the Future.” Statistical Science 6: 219-239.Google Scholar

  • Efron, B. and R. Tibshirani. 1993. An introduction to the Bootstrap. New York: Chapman and Hall.Google Scholar

  • Fienberg, S. 1994. A Radical Proposal for the Provision of Micro-Data Samples and the Preservation of Confidentiality. Technical Report 611, Department of Statistics, Carnegie Mellon University.Google Scholar

  • Hundepool, A., J. Domingo-Ferrer, L. Franconi, S. Giessing, E. Nordholt, K. Spicer, and P. de Wolf. 2012. Statistical Disclosure Control. Chichester, UK: John Wiley & Sons.Google Scholar

  • Jiménez, J., G. Navarro-Arribas, and V. Torra. 2014. “JPEG-Based Microdata Protection.” In Privacy in Statistical Databases, edited by J. Domingo-Ferrer. Lecture Notes in Computer Science, 8744: 117-129. Springer International Publishing. Doi: http://dx. doi.org/10.1007/978-3-319-11257-210.CrossrefGoogle Scholar

  • Jolliffe, I. 2002. Principal Component Analysis. New York, USA: Springer.Google Scholar

  • Kim, H., A. Karr, and J. Reiter. 2015. “Statistical Disclosure Limitation in the Presence of Edit Rules.” Journal of Official Statistics 31: 121-138. Doi: http://dx.doi.org/10.1515/jos-2015-0006.CrossrefGoogle Scholar

  • Liew, C., U. Choi, and C. Liew. 1985. “A Data Distortion by Probability Distribution.” ACM Transactions Database Systems 10: 395-411.Google Scholar

  • Mateo-Sanz, J., J. Domingo-Ferrer, and F. Sebe´. 2005. “Probabilistic Information Loss Measures in Confidentiality Protection of Continuous Microdata.” Data Mining and Knowledge Discovery 11: 181-193. Doi: http://dx.doi.org/10.1007/s10618-005-0011-9.CrossrefGoogle Scholar

  • Moore, R. 1996. Controlled Data Swapping Techniques for Masking Public use Microdata Sets. Technical report, U.S. Bureau of the Census, Washington, D.C. Available at: https://www.census.gov/srd/papers/pdf/rr96-4.pdf (accessed August 2016).Google Scholar

  • Muralidhar, K. and R. Sarathy. 2008. “Generating Sufficiency-Based Non-Synthetic Perturbed Data.” Transactions on Data Privacy 1: 17-33. Available: at http://www.tdp.cat/issues/tdp.a005a08.pdf (accessed August 2016).Google Scholar

  • Muralidhar, K., R. Sarathy, and J. Domingo-Ferrer. 2014. “Reverse Mapping to Preserve the Marginal Distributions of Attributes in Masked Microdata.” In Privacy in Statistical Databases, edited by J. Domingo-Ferrer. Lecture Notes in Computer Science, 8744: 105-116. Springer International Publishing. Doi: http://dx.doi.org/10.1007/978-3-319-11257-29.CrossrefGoogle Scholar

  • Oganian, A. and A. Karr. 2006. “Combinations of SDC Methods for Microdata Protection.” In Privacy in Statistical Databases, edited by J. Domingo-Ferrer and L. Franconi. Lecture Notes in Computer Science, 4302: 102-113. Berlin Heidelberg: Springer. Doi: http://dx.doi.org/10.1007/1193024210.CrossrefGoogle Scholar

  • Pagliuca, D. and G. Seri. 1999. Some Results of Individual Ranking Method on the System of Enterprise Accounts Annual Survey. Esprit SDC Project, Deliverable MI-3/D2.Google Scholar

  • R Core Team. 2014. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Website: http://www.R-project.org/.Google Scholar

  • Raghunathan, T.E., J. Reiter, and D. Rubin. 2003. “Multiple Imputation for Statistical Disclosure Limitation.” Journal of Official Statistics 19: 1-16.Google Scholar

  • Rubin, D. 1993. “Statistical Disclosure Limitation.” Journal of Official Statistics 9: 461-468.Google Scholar

  • Sarathy, R. and M. Krishnamurty. 2002. “The Security of Confidential Numerical Data in Databases.” Information Systems Research 13: 389-403. Doi: http://dx.doi.org/10.1287/isre.13.4.389.74.CrossrefGoogle Scholar

  • Templ, M. 2008. “Statistical Disclosure Control for Microdata Using the Rpackage sdcMicro.” Transactions on Data Privacy 1: 67-85. Doi: http://dx.doi.org/10.18637/jss.v067.i04. Google Scholar

  • Woo, M., J. Reiter, A. Oganian, and A. Karr. 2009. “Global Measures of Data Utility for Microdata Masked for Disclosure Limitation.” Journal of Privacy and Confidentiality 1: 111-124. Google Scholar

About the article

Received: 2015-09-01

Revised: 2016-08-01

Accepted: 2016-08-01

Published Online: 2017-02-21

Published in Print: 2017-03-01


Citation Information: Journal of Official Statistics, ISSN (Online) 2001-7367, DOI: https://doi.org/10.1515/jos-2017-0002.

Export Citation

© by Aida Calviño. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0

Comments (0)

Please log in or register to comment.
Log in