Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Journal of Official Statistics

The Journal of Statistics Sweden

4 Issues per year


IMPACT FACTOR 2016: 0.411
5-year IMPACT FACTOR: 0.776

CiteScore 2016: 0.63

SCImago Journal Rank (SJR) 2016: 0.710
Source Normalized Impact per Paper (SNIP) 2016: 0.975

Open Access
Online
ISSN
2001-7367
See all formats and pricing
More options …

Disclosure Risk from Factor Scores

Jörg Drechsler
  • Corresponding author
  • Institute for Employment Research, Statistical Methods, Regensburger Str. 104, Nuremberg 90478, Germany
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Gerd Ronning / Philipp Bleninger
Published Online: 2014-02-14 | DOI: https://doi.org/10.2478/jos-2014-0006

Abstract

Remote access can be a powerful tool for providing data access for external researchers. Since the microdata never leave the secure environment of the data-providing agency, alterations of the microdata can be kept to a minimum. Nevertheless, remote access is not free from risk. Many statistical analyses that do not seem to provide disclosive information at first sight can be used by sophisticated intruders to reveal sensitive information. For this reason the list of allowed queries is usually restricted in a remote setting. However, it is not always easy to identify problematic queries. We therefore strongly support the argument that has been made by other authors: that all queries should be monitored carefully and that any microlevel information should always be withheld. As an illustrative example, we use factor score analysis, for which the output of interest - the factor loading of the variables - seems to be unproblematic. However, as we show in the article, the individual factor scores that are usually returned as part of the output can be used to reveal sensitive information. Our empirical evaluations based on a German establishment survey emphasize that this risk is far from a purely theoretical problem.

Keywords: Remote data access; confidentiality; statistical disclosure control; factor analysis

References

  • Bartlett, M. (1937). The Statistical Conception of Mental Factors. British Journal of Psychology, 28, 97-104. DOI: http://www.dx.doi.org/10.1111/j.2044-8295.1937.tb00863.x CrossrefGoogle Scholar

  • Bleninger, P., Drechsler, J., and Ronning, G. (2011). Remote Data Access and the Risk of Disclosure from Linear Regression. SORT, Special Issue: Privacy in Statistical Databases, 7-24.Google Scholar

  • Brandt, M., Franconi, L., Guerke, C., Hundepool, A., Lucarelli, M., Mol, J., Ritchie, F., Seri, G., and Welpton, R. (2010). Guidelines for the Checking of Output Based on Microdata Research. Final report of ESSnet sub-group on output SDC.Google Scholar

  • Cross-National Data Center in Luxembourg (2012a). Available at: http://www. lisdatacenter.org (accessed January 17, 2014).Google Scholar

  • Cross-National Data Center in Luxembourg (2012b). Available at: http://www. lisdatacenter.org/data-access/lissy/best-practices/ (accessed January 17, 2014).Google Scholar

  • Drechsler, J. (2011). Multiple Imputation in Practice - a Case Study Using a Complex German Establishment Survey. Advances in Statistical Analysis, 95, 1-26. DOI: http:// www.dx.doi.org/10.1007/s10182-010-0136-z Dwork, C. (2006). Differential Privacy. In Proceedings of the 33rd International Colloquium on Automata, Languages, and Programming (ICALP), 1-12.CrossrefGoogle Scholar

  • Fahrmeir, L., Hamerle, A., and Tutz, G. (1996). Multivariate Statistische Verfahren, (2nd edn). Berlin: De Gruyter.Google Scholar

  • Fischer, G., Janik, F., Mu¨ller, D., and Schmucker, A. (2009). The IAB Establishment Panel - Things Users Should Know. Schmollers Jahrbuch - Journal of Applied Social Science Studies, 129, 133-148. DOI: http://www.dx.doi.org/10.3790/schm.129.1.133CrossrefGoogle Scholar

  • Gomatam, S., Karr, A., Reiter, J., and Sanil, A. (2005). Data Dissemination and Disclosure Limitation in a World Without Microdata: A Risk-Utility Framework for Remote Access Analysis Servers. Statistical Science, 20, 163-177. DOI: http://www.dx.doi. org/10.1214/088342305000000043CrossrefGoogle Scholar

  • Heining, J. (2009). The Research Data Centre of the German Federal Employment Agency: Data Supply and Demand Between 2004 and 2009. RatSWD working paper, 129.Google Scholar

  • Horst, P. (1965). Factor Analysis of Data Matrices. New York: Holt, Rinehart & Winston.Google Scholar

  • Kaiser, H. (1958). The Varimax Criterion for Analytic Rotation in Factor Analysis. Psychometrika, 23, 3, 187-200. DOI: http://www.dx.doi.org/10.1007/BF02289233CrossrefGoogle Scholar

  • Kölling, A. (2000). The IAB-Establishment Panel. Journal of Applied Social Science Studies, 120, 291-300.Google Scholar

  • Lucero, J., Freiman, M., Singh, L., You, J., DePersio, M., and Zayatz, L. (2011). The Microdata Analysis System at the U.S. Census Bureau. SORT, Special Issue: Privacy in Statistical Databases, 77-98. McDonald, R. and Burr, E. (1967). A Comparison of Four Methods for Constructing Factor Scores. Psychometrika, 32, 381-401. DOI: http://www.dx.doi.org/10.1007/ BF02289653CrossrefGoogle Scholar

  • O’Keefe, C., Sparks, R., McAullay, D., and Loong, B. (2012). Confidentialising Survival Analysis Output in a Remote Data Access System. Journal of Privacy and Confidentiality 4. Available at: http://repository.cmu.edu/jpc/vol4/iss1/6 (accessed January 17, 2014).Google Scholar

  • O’Keefe, C.M. and Good, N.M. (2008). A Remote Analysis Server - What Does Regression Output Look Like? In Privacy in Statistical Databases, J. Domingo-Ferrer and Y. Saygin (eds), vol 5262 of Lecture Notes in Computer Science. New York: Springer, 270-283. Press, S. (2005). Applied Multivariate Analysis: Using Bayesian and Frequentist Methods of Inference, (2nd edn). New York: Dover Publications. Google Scholar

  • Research Data Center of the National Center for Health Statistics (2012a). Available at: http://www.cdc.gov/rdc/B2AccessMod/ACs230.htm (accessed January 17, 2014).Google Scholar

  • Research Data Center of the National Center for Health Statistics (2012b). Available at: http://www.cdc.gov/rdc/Data/B2/SASSUDAANRestrictions.pdf (accessed January 17, 2014).Google Scholar

  • Ronning, G. and Bleninger, P. (2011). Disclosure Risk From Factor Scores. Technical Report, IAW Discussion Papers 73. Available at: http://www.iaw.edu/w/IAWPDF. php?id¼886&name¼iaw_dp_73.pdf (accessed January 17, 2014).Google Scholar

  • Sparks, R., Carter, C., Donnelly, J., O’Keefe, C., Duncan, J., Keighley, T., and McAullay,D. (2008). Remote Access Methods for Exploratory Data Analysis and Statistical Modelling: Privacy-preserving Analytics. Comput Methods Programs Biomed, 91, 208-222. DOI: http://www.dx.doi.org/10.1016/j.cmpb.2008.04.001CrossrefWeb of ScienceGoogle Scholar

  • Stock, J. and Watson, M. (2002). Forecasting Using Principal Components From a Large Number of Predictors. Journal of the American Statistical Association, 97, 1167-1179. DOI: http://www.dx.doi.org/10.1198/016214502388618960CrossrefGoogle Scholar

  • Thomson, G. (1939). The Factorial Analysis of Human Ability. London: University of London Press.Google Scholar

  • Thurstone, L. (1935). The Vectors of Mind. Chicago: University of Chicago Press. Google Scholar

About the article

Published Online: 2014-02-14

Published in Print: 2014-03-01


Citation Information: Journal of Official Statistics, Volume 30, Issue 1, Pages 107–122, ISSN (Online) 2001-7367, DOI: https://doi.org/10.2478/jos-2014-0006.

Export Citation

This content is open access.

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

[1]
Daniela Hochfellner, Dana Müller, and Alexandra Schmucker
Journal of Empirical Research on Human Research Ethics, 2014, Volume 9, Number 5, Page 8

Comments (0)

Please log in or register to comment.
Log in