Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Proceedings on Privacy Enhancing Technologies

4 Issues per year

Open Access
See all formats and pricing
More options …

Linking Health Records for Federated Query Processing

Rinku Dewri / Toan Ong / Ramakrishna Thurimella
Published Online: 2016-05-06 | DOI: https://doi.org/10.1515/popets-2016-0013


A federated query portal in an electronic health record infrastructure enables large epidemiology studies by combining data from geographically dispersed medical institutions. However, an individual’s health record has been found to be distributed across multiple carrier databases in local settings. Privacy regulations may prohibit a data source from revealing clear text identifiers, thereby making it non-trivial for a query aggregator to determine which records correspond to the same underlying individual. In this paper, we explore this problem of privately detecting and tracking the health records of an individual in a distributed infrastructure. We begin with a secure set intersection protocol based on commutative encryption, and show how to make it practical on comparison spaces as large as 1010 pairs. Using bigram matching, precomputed tables, and data parallelism, we successfully reduced the execution time to a matter of minutes, while retaining a high degree of accuracy even in records with data entry errors. We also propose techniques to prevent the inference of identifier information when knowledge of underlying data distributions is known to an adversary. Finally, we discuss how records can be tracked utilizing the detection results during query processing.

Keywords: private record linkage; commutative encryption


  • [1] N. Adam, T. White, B. Shafiq, J. Vaidya, and X. He. Privacy preserving integration of health care data. In AMIA Annual Symposium Proceedings, pages 1-5, 2007.Google Scholar

  • [2] N. Adly. Efficient record linkage using a double embedding scheme. In International Conference on Data Mining, pagesGoogle Scholar

  • [3] R. Agrawal, A. Evfimievski, and R. Srikant. Information sharing across private databases. In ACM SIGMOD International Conference on Management of Data, pages 86-97, 2003.Google Scholar

  • [4] H. Brenner. Application of capture-recapture methods for disease monitoring: Potential effects of imperfect record linkage. Methods of Information in Medicine, 33(5):502-506, 1994.Google Scholar

  • [5] P. Christen. Probabilistic data generation for deduplication and data linkage. In International Conference on Intelligent Data Engineering and Automated Learning, pages 109-116, 2005.Google Scholar

  • [6] P. Christen. Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface. In ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pages 1065-68, 2008.Google Scholar

  • [7] P. Christen. A survey of indexing techniques for scalable record linkage and deduplication. IEEE Transactions on Knowledge and Data Engineering, 24(9):1537-1555, 2012.Google Scholar

  • [8] T. Churches and P. Christen. Blind data linkage using ngrams similarity comparisons. In Advances in Knowledge Discovery and Data Mining, pages 121-126, 2004.Google Scholar

  • [9] T. Churches and P. Christen. Some methods for blindfolded record linkage. BMC Medical Informatics and Decision Making, 4:9, 2004.Google Scholar

  • [10] E. Durham, Y. Xue, M. Kantarcioglu, and B. Malin. Private medical record linkage with approximate matching. In AMIA Annual Symposium Proceedings, pages 182-186, 2010.Google Scholar

  • [11] E. A. Durham et al. Composite bloom filters for secure record linkage. IEEE Transactions on Knowledge and Data Engineering, 26(12):2956-2968, 2013.Google Scholar

  • [12] S. B. Dusetzina et al. Linking data for health services research: A framework and instructional guide. Technical Report 14-EHC033-EF, Agency for Healthcare Research and Quality (US), 2014.Google Scholar

  • [13] L. Dusserre, C. Quantin, and H. Bouzelat. A one way public key cryptosystem for the linkage of nominal files in epidemiological studies. MedInfo, 8 (Pt 1):644-647, 1995.Google Scholar

  • [14] S. Duvall, R. Kerber, and A. Thomas. Extending the Fellegi- Sunter probabilistic record linkage method for approximate field comparators. Journal of Biomedical Informatics, 43(1):24-30, 2010.Web of ScienceCrossrefGoogle Scholar

  • [15] I. Fellegi and A. Sunter. A theory for record linkage. Journal of the American Statistical Association, 64:1183-1210, 1969.CrossrefGoogle Scholar

  • [16] J. T. Finnell. In support of emergency department health information technology. In AMIA Annual Proceedings Symposium, pages 246-250, 2005.Google Scholar

  • [17] M. J. Freedman, K. Nissim, and B. Pinkas. Efficient private matching and set intersection. In International Conference on the Theory and Applications of Cryptographic Techniques, pages 1-19, 2004.Google Scholar

  • [18] S. J. Grannis, J. M. Overhage, S. Hui, and C. J. McDonald. Analysis of a probabilistic record linkage technique without human review. In AMIA Annual Symposium Proceedings, pages 259-63, 2003.Google Scholar

  • [19] S. J. Grannis, J. M. Overhage, and C. McDonald. Analysis of identifier performance using a deterministic linkage algorithm. In AMIA Annual Symposium Proceedings, pages 305-309, 2002.Google Scholar

  • [20] S. J. Grannis, J. M. Overhage, and C. McDonald. Real world performance of approximate string comparators for use in patient matching. Studies in Health Technology and Informatics, 107(Pt 1):43-7, 2004.Google Scholar

  • [21] A. Gruenheid, X. L. Dong, and D. Srivastava. Incremental record linkage. Proceedings of the VLDB Endowment, 7(9):697-708, 2014.Google Scholar

  • [22] A. Inan, M. Kantarcioglu, E. Bertino, and M. Scannapieco. A hybird approach to private record linkage. In International Conference in Data Engineering, pages 496-505, 2008.Google Scholar

  • [23] A. Inan, M. Kantarcioglu, G. Ghinita, and E. Bertino. Private record matching using differential privacy. In International Conference on Extending Database Technology, pages 123-134, 2010.Google Scholar

  • [24] A. Karakasidis and V. S. Verykios. Privacy preserving record linkage using phonetic codes. In Balkan Conference in Informatics, pages 101-106, 2009.Google Scholar

  • [25] A. Karakasidis and V. S. Verykios. Secure blocking + Secure matching = Secure record linkage. Journal of Computing Science and Engineering, 5(3):223-235, 2011.CrossrefGoogle Scholar

  • [26] M. Kuzu, M. Kantarcioglu, E. Durham, and B. Malin. A constraint satisfaction cryptanalysis of bloom filters in private record linkage. In International Conference on Privacy Enhancing Technologies, pages 226-245, 2011.Google Scholar

  • [27] M. Kuzu, M. Kantarcioglu, E. Durham, C. Toth, and B. Malin. A practical approach to achieve private medical record linkage in light of public resources. Journal of the American Medical Informatics Association, 20(2):285-292, 2013.CrossrefWeb of ScienceGoogle Scholar

  • [28] D. V. LaBorde, J. A. Griffin, H. K. Smalley, P. Keskinocak, and G. Matthew. A framework for assessing patient crossover and health information exchange value. Journal of American Medical Informatics Association, 18(5):698-703, 2011.CrossrefWeb of ScienceGoogle Scholar

  • [29] I. Lazrig et al. Privacy preserving record matching using automated semi-trusted broker. In Annual Working Conference in Data and Applications Security and Privacy, pages 103-118, 2015.Google Scholar

  • [30] B. Malin and E. Airoldi. Confidentiality preserving audits of electronic medical record access. Studies in Health Technology and Informatics, 129(1):320-324, 2007.Google Scholar

  • [31] F. Niedermeyer, S. Steinmetzer, M. Kroll, and R. Schnell. Cryptanalysis of basic bloom filters used for privacy preserving record linkage. Journal of Privacy and Confidentiality, 6(2):59-79, 2014.Google Scholar

  • [32] B. Pinkas, T. Schneider, and M. Zoner. Faster private set intersection based on ot extension. In 23rd USENIX Conference on Security Symposium, pages 797-812, 2014.Google Scholar

  • [33] S. C. Pohlig and M. E. Hellman. An improved algorithm for computing logarithms over GF(p) and its cryptographic significance. IEEE Transactions on Information Theory, 24(1):106-110, 1978.CrossrefGoogle Scholar

  • [34] S. M. Randall, A. M. Ferrante, J. H. Boyd, J. K. Bauer, and J. B. Semmens. Privacy-preserving record linkage on large real world datasets. Journal of Biomedical Informatics, 50:205-212, 2014.Web of ScienceCrossrefGoogle Scholar

  • [35] P. Ravikumar, W. W. Cohen, and S. E. Fienberg. A secure protocol for computing string distance metrics. In PSDM held at ICDM, pages 40-46, 2004. Google Scholar

  • [36] M. Scannapieco, I. Figotin, E. Bertino, and A. K. Elmagarmid. Privacy preserving schema and data matching. In ACM SIGMOD International Conference on Management of Data, pages 653-64, 2007.Google Scholar

  • [37] L. M. Schilling et al. Scalable Architecture for Federated Translational Inquiries Network (SAFTINet) technology infrastructure for a distributed data network. eGEMs (Generating Evidence & Methods to improve patient outcomes), 1(1):1027, 2013.Google Scholar

  • [38] K. Schmidlin, K. M. Clough-Gorr, and A. Spoerri. Privacy Preserving Probabilistic Record Linkage (P3RL): A novel method for linking existing health-related data and maintaining participant confidentiality. BMC Medical Research Methodology, 15(46):open access, 2015.Web of ScienceGoogle Scholar

  • [39] R. Schnell, T. Bachteler, and J. Reiher. Privacy-preserving record linkage using bloom filters. BMC Medical Informatics and Decision Making, 9:41, 2009.Web of ScienceGoogle Scholar

  • [40] X. S. Wang et al. Efficient genome-wide, privacy-preserving similar patient query based on private edit distance. In 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 492-503, 2015.Google Scholar

  • [41] G. M. Weber. Federated queries of clinical data repositories: The sum of the parts does not equal the whole. Journal of American Medical Informatics Association, 20:e155-e161, 2013.CrossrefWeb of ScienceGoogle Scholar

  • [42] W. E. Winkler. The state of record linkage and current research problems. Technical report, Statistical Research Division, U.S. Census Bureau of the Census, 1999.Google Scholar

  • [43] M. Yakout, M. J. Atallah, and A. K. Elmagarmid. Efficient private record linkage. In International Conference in Data Engineering, pages 1283-1286, 2009.Google Scholar

About the article

Received: 2015-11-30

Revised: 2016-03-01

Accepted: 2016-03-02

Published Online: 2016-05-06

Published in Print: 2016-07-01

Citation Information: Proceedings on Privacy Enhancing Technologies, Volume 2016, Issue 3, Pages 4–23, ISSN (Online) 2299-0984, DOI: https://doi.org/10.1515/popets-2016-0013.

Export Citation

© 2016. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0

Comments (0)

Please log in or register to comment.
Log in