Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Journal of Official Statistics

The Journal of Statistics Sweden

4 Issues per year


IMPACT FACTOR 2016: 0.411
5-year IMPACT FACTOR: 0.776

CiteScore 2016: 0.63

SCImago Journal Rank (SJR) 2016: 0.710
Source Normalized Impact per Paper (SNIP) 2016: 0.975

Open Access
Online
ISSN
2001-7367
See all formats and pricing
More options …

Disclosure-Protected Inference with Linked Microdata Using a Remote Analysis Server

James O. Chipperfield
  • Senior Research Fellow, National Institute for Applied Statistics Research Australia, University of Wollongong, and Assistant Director, Methodology Division, Australian Bureau of Statistics, Canberra, ACT, 2617, Australia
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2014-02-14 | DOI: https://doi.org/10.2478/jos-2014-0007

Abstract

Large amounts of microdata are collected by data custodians in the form of censuses and administrative records. Often, data custodians will collect different information on the same individual. Many important questions can be answered by linking microdata collected by different data custodians. For this reason, there is very strong demand from analysts, within government, business, and universities, for linked microdata. However, many data custodians are legally obliged to ensure the risk of disclosing information about a person or organisation is acceptably low. Different authors have considered the problem of how to facilitate reliable statistical inference from analysis of linked microdata while ensuring that the risk of disclosure is acceptably low. This article considers the problem from the perspective of an Integrating Authority that, by definition, is trusted to link the microdata and to facilitate analysts’ access to the linked microdata via a remote server, which allows analysts to fit models and view the statistical output without being able to observe the underlying linked microdata. One disclosure risk that must be managed by an Integrating Authority is that one data custodian may use the microdata it supplied to the Integrating Authority and statistical output released from the remote server to disclose information about a person or organisation that was supplied by the other data custodian. This article considers analysis of only binary variables. The utility and disclosure risk of the proposed method are investigated both in a simulation and using a real example. This article shows that some popular protections against disclosure (dropping records, rounding regression coefficients or imposing restrictions on model selection) can be ineffective in the above setting.

Keywords: Confidentiality; remote analysis; record linkage; statistical disclosure control

References

  • Bleninger, P., Drechsler, J., and Ronning, G. (2010). Remote Data Access and the Risk of Disclosure from Linear Regression: An Empirical Study. Privacy in statistical databases, J. Domingo-Ferrer and E. Magkos (eds). New York: Springer.Google Scholar

  • Chambers, R.L. and Skinner, C.J. (2003). Analysis of Survey Data. Hoboken, NJ: John Wiley and Sons.Google Scholar

  • Churches, T. and Christen, P. (2004). Some methods for blindfolded record linkage. BMC Medical Informatics and Decision Making, 4, Available at: http://www. pubmedcentral.nih.gov/tocrender.fcgi?iid¼10563 (accessed June 2012).Google Scholar

  • Cox, L.H., Karr, A.F., and Kinney, S.K. (2011). Risk-Utility Paradigms for Statistical Disclosure Limitation: How to Think but not How to Act. International Statistical Review, 79, 160-183. DOI : http://dx.doi.org/10.1111/j.1751-5823.2011.00140.x Dwork, C. and Smith, A. (2009). Differential Privacy for Statistics: What We Know and What We Want to Learn. Journal of Privacy and Confidentiality, 1, 135-154.Web of ScienceCrossrefGoogle Scholar

  • Gomatam, S., Karr, A., Reiter, J., and Sanil, A. (2005). Data Dissemination and Disclosure Limitation in a World Without Microdata: A Risk-Utility Framework for Remote Access Systems. Statistical Science, 20, 163-177. DOI : http://dx.doi.org/10.1214/088342305000000043CrossrefGoogle Scholar

  • Herzog, T.N., Scheuren, F.L., and Winkler, W.E. (2007). Data Quality and Record Linkage. Berlin: Springer.Google Scholar

  • Hosmer, D.W. and Lemeshow, S. (2000). Applied Logistic Regression. Hoboken, NJ: John Wiley and Sons Inc.Google Scholar

  • Karr, A.F., Lin, X., Sanil, A.P., and Reiter, J.P. (2009). Privacy Preserving Analysis of Vertically Partitioned Data Using Secure Matrix Products. Journal of Official Statistics, 25, 125-138.Google Scholar

  • Kohnen, C. and Reiter, J.P. (2009). Multiple Imputation for Combining Confidential Data Owned by Two Agencies. Journal of the Royal Statistical Society Series A, 172, 511-528. DOI : http://dx.doi.org/10.1111/j.1467-985x.2008.00574.x CrossrefWeb of ScienceGoogle Scholar

  • Lucero, J. and Zayatz, L. (2010). The Microdata Analysis System at the U.S. Census Bureau. Privacy in Statistical Databases, J. Domingo-Ferrer and E. Magkos (eds). New York: Springer.Google Scholar

  • McCullagh, P. and Nelder, J. (1989). Generalized Linear Models (2nd ed.). London: Chapman and Hall.Google Scholar

  • O’Keefe, C. and Chipperfield, J.O. (2013). A Summary of Attack Methods and Confidentiality Protection Measures for Fully Automated Remote Analysis Systems. International Statistical Review. DOI : http://dx.doi.org/10.1111/insr.12021CrossrefWeb of ScienceGoogle Scholar

  • O’Keefe, C. and Good, N. (2009). Regression Output from a Remote Analysis System. Data & Knowledge Engineering, 68, 1175-1186. DOI : http://dx.doi.org/10.1016/j.datak.2009.06.009CrossrefGoogle Scholar

  • O’Keefe, C., Sparks, R., McAullay, D., and Loong, B. (2012). Confidentialising the Output of a Survival Analysis in a Remote Analysis System (to appear). Journal of Privacy and Confidentiality, 4, 127-154.Google Scholar

  • Reiter, J. (2002). Satisfying Disclosure Restrictions with Synthetic Data Sets. Journal of Official Statistics, 18, 511-530.Google Scholar

  • Shao, J. and Tu, D. (1995). The Jackknife and Bootstrap. Hoboken, NJ: John Wiley and Sons.Google Scholar

  • Shlomo, N. (2007). Statistical Disclosure Control Methods for Census Frequency Tables. International Statistical Review, 75, 199-217. DOI : http://dx.doi.org/10.1111/j.1751-5823.2007.00010.x Web of ScienceCrossrefGoogle Scholar

  • Skinner, C. and Shlomo, N. (2008). Assessing Identification Risk in Survey Microdata Using Log-Linear Models. Journal of American Statistical Association, 103, 989-1001. DOI : http://dx.doi.org/10.1198/016214507000001328CrossrefGoogle Scholar

  • Sparks, R., Carter, C., Donnelly, J., O’Keefe, C., Duncan, J., and Keighley, T. (2008). Remote Access Methods for Exploratory Data Analysis and Statistical Modelling: Privacy-Preserving Analyticse. Computer Methods and Programs in Biomedicine, 91, 208-222. Web of ScienceGoogle Scholar

About the article

Published Online: 2014-02-14

Published in Print: 2014-03-01


Citation Information: Journal of Official Statistics, ISSN (Online) 2001-7367, DOI: https://doi.org/10.2478/jos-2014-0007.

Export Citation

This content is open access.

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

[1]
Mary E. Thompson
Canadian Journal of Statistics, 2017
[2]
James Chipperfield, Daniel Gow, and Bronwyn Loong
Statistical Journal of the IAOS, 2016, Volume 32, Number 1, Page 53

Comments (0)

Please log in or register to comment.
Log in