Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Acta Universitatis Sapientiae, Informatica

The Journal of "Sapientia" Hungarian University of Transylvania

2 Issues per year

Open Access
See all formats and pricing
More options …

Survey on privacy preserving data mining techniques in health care databases

Tamás Zoltán Gál / Gábor Kovács / Zsolt T. Kardkovács
Published Online: 2014-06-27 | DOI: https://doi.org/10.2478/ausi-2014-0017


In health care databases, there are tireless and antagonistic interests between data mining research and privacy preservation, the more you try to hide sensitive private information, the less valuable it is for analysis. In this paper, we give an outlook on data anonymization problems by case studies. We give a summary on the state-of-the-art health care data anonymization issues including legal environment and expectations, the most common attacking strategies on privacy, and the proposed metrics for evaluating usefulness and privacy preservation for anonymization. Finally, we summarize the strength and the shortcomings of different approaches and techniques from the literature based on these evaluations.

Keywords: anonymization; privacy preservation; health care database; survey; data mining; sensitive data


  • [1] C. C. Aggarwal, P. S. Yu, An introduction to privacy-preserving data mining. in: Privacy-Preserving Data Mining, (Eds.: C. C. Aggarwal and P. S. Yu) chapter 1, pp. 1-9. Springer-Verlag, 2008. ⇒40, 49Google Scholar

  • [2] D. Agrawal, C. C. Aggarwal, On the design and quantification of privacy preserving data mining algorithms. Proc. 20th ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 247-255. ACM, 2002. ⇒41, 49, 51Google Scholar

  • [3] R. Agrawal, R. Srikant, Privacy-preserving data mining. ACM Sigmod Record 29, 2 (2000) 439-450. ⇒41CrossrefGoogle Scholar

  • [4] M. Barbaro, T. J. Zeller, S. Hansell, A face is exposed for aol searcher no. 4417749. The New York Times, 9 Aug. 1, 2006. ⇒36Google Scholar

  • [5] R. J. Bayardo, R. Agrawal, Data privacy through optimal k-anonymization. Proc. 21st International Conference on Data Engineering, ICDE ’05, pp. 217-228, Washington, DC, USA, 2005. IEEE Computer Society. ⇒44Google Scholar

  • [6] D. Benatar, Indiscretion and other threats to confidentiality. South African J. Bioethics and Law, 3, 2 (2010) 59-62. ⇒36Google Scholar

  • [7] J. J. Berman, Confidentiality issues for medical data miners. Artificial Intelligence in Medicine 26, 1 (2001) 25-36. ⇒36Google Scholar

  • [8] E. Bertino, D. Lin, W. Jiang, Privacy Preserving Data Mining, chapter A survey of quantification of privacy preserving data mining algorithms, pp. 183-205. Springer, 2008. ⇒49, 50Google Scholar

  • [9] C. Dwork, K. Nissim, Privacy-preserving datamining on vertically partitioned databases. in: Advances in Cryptology - CRYPTO 2004 pp. 134-138. Springer, 2004. ⇒49Google Scholar

  • [10] K. El Emam, D. Buckerdige, A. Neisa, E. Jonker, A. Verma, The re-identification risk of canadians from longitudinal demographics. BMC Medical Informatics and Decision Making 11, 46 (2011) 1-12. ⇒36, 42Web of ScienceGoogle Scholar

  • [11] A. V. Evfimievski, Randomization in privacy preserving data mining. ACM SIGKDD Explorations Newsletter 4, 2 (2002) 43-48. ⇒41Google Scholar

  • [12] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, From data mining to knowledge discovery in databases. AI Magazine 17, 3 (1996)37-54. ⇒35Google Scholar

  • [13] R. Gellman, The story of the banker, state commission, health records, and the called loans: An urban legend?, http://bobgellman.com/rg-docs/rg-bankerstory.pdf 2011. ⇒35Google Scholar

  • [14] A. Gionis, A. Mazza, T. Tassa, k-anonymization revisited. Proc. IEEE 24th Int. Conf. Data Engineering ICDE 2008, pp. 744-753, 2008. ⇒41Google Scholar

  • [15] M. Kantarcıoˇglu, J. Jin, C. Clifton, When do data mining results violate privacy? Proc. 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, pp. 599-604, 2004. ⇒50Google Scholar

  • [16] H. Kargupta, S. Datta, Q. Wang, K. Sivakumar, On the privacy preserving properties of random data perturbation techniques. Third IEEE International Conference on Data Mining, 2003. ICDM 2003, pp. 99-106. IEEE, 2003. ⇒49Google Scholar

  • [17] D. Kifer, Attacks on privacy and deFinetti’s theorem. Proc. 2009 ACM SIGMOD International Conference on Management of Data, pp. 127-138. ACM, 2009. ⇒ 50Google Scholar

  • [18] M. R. Koot, G. van’t Noordende, C. de Laat, A study on the re-identifiability of dutch citizens. in: Workshop on 3rd Hot Topics in Privacy Enhancing Technologies, HotPETs 2010, 2010. ⇒36, 42, 44Google Scholar

  • [19] K. LeFevre, D. J. DeWitt, R. Ramakrishnan, Incognito: Efficient full-domain kanonymity. Proc. 2005 ACM SIGMOD International Conference on Management of Data, pp. 49-60. ACM, 2005. ⇒44Google Scholar

  • [20] K. LeFevre, D. J. DeWitt, R. Ramakrishnan, Mondrian multidimensional kanonymity. Proc. 22nd International Conference on Data Engineering, 2006. ICDE’06., pp. 25-25. IEEE, 2006. ⇒44Google Scholar

  • [21] J.-L. Lin, J. Y.-C. Liu, Privacy preserving itemset mining through fake transactions. SAC, pp. 375-379, 2007. ⇒45, 46Google Scholar

  • [22] A. Machanavajjhala, D. Kifer, J. Gehrke, M. Venkitasubramaniam, L-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data 1, 1(2007) 3-3. ⇒41, 44, 45Google Scholar

  • [23] G. Miklau, D. Suciu, A formal analysis of information disclosure in data exchange. Proc. 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD ’04, pp. 575-586, New York, NY, USA, 2004. ACM. ⇒41Google Scholar

  • [24] M. Miller, J. Seberry, Relative compromise of statistical databases. Austral. Computer J. 21, 2 (1989) 56-61. ⇒41Google Scholar

  • [25] A. Narayanan, V. Shmatikov, Robust de-anonymization of large sparse datasets. Proc. 2008 IEEE Symposium on Security and Privacy, SP ’08, pp. 111-125, Washington, DC, USA, 2008. IEEE Computer Society. ⇒36, 40Google Scholar

  • [26] S. R. M. Oliveira, O. R. Za¨ıane, A privacy-preserving clustering approach toward secure and effective data analysis for business collaboration. Computers & Security 26, 1 (2007) 81-93. ⇒49Web of ScienceGoogle Scholar

  • [27] L. Sweeney, Datafly: A system for providing anonymity in medical data. Proc IFIP TC11 WG11.3 Eleventh International Conference on Database Securty XI: Status and Prospects, pp. 356-381, 1997. ⇒34, 35, 36, 49Google Scholar

  • [28] L. Sweeney, Achieving k-anonymity privacy protection using generalization and suppression. Intern. J. Uncertainty, Fuzziness and Knowledge-Based Systems 10, 5 (2002) 571-588. ⇒44, 50Google Scholar

  • [29] L. Sweeney, k-anonymity: A model for protecting privacy. Intern. J. Uncertainty, Fuzziness and Knowledge-Based Systems 10, 5 (2002) 557-570. ⇒41, 42, 44Google Scholar

  • [30] J. Vaidya, Y. Zhu, C. W. Clifton, Privacy preserving data mining. Advances in Information Security 19 (2006) 1-121. ⇒37, 40Google Scholar

  • [31] K. Wahlstrom, J. F. Roddick, R. Sarre, V. Estivill-Castro, D. de Vries, Encyclopedia of Data Warehousing and Mining, volume 2, chapter Legal and technical issues of privacy preservation in data mining, pp. 1158-1163. IGI Publishing, 2nd edition, 2008. ⇒40Google Scholar

  • [32] X. Xiao, Y. Tao, Anatomy: Simple and effective privacy preservation. Proc. 32nd International Conference on Very Large Data Bases (VLDB), pp. 139-150. VLDB Endowment, 2006. ⇒47Google Scholar

  • [33] Q. Zhang, N. Koudas, D. Srivastava, T. Yu, Aggregate query answering on anonymized tables. IEEE 23rd International Conference on Data Engineering, 2007. ICDE 2007, pp. 116-125, Istanbul, Turkey, 2007. IEEE Xplore. ⇒47Google Scholar

About the article

Received: 2013-10-09

Revised: 2014-03-04

Published Online: 2014-06-27

Published in Print: 2014-06-01

Citation Information: Acta Universitatis Sapientiae, Informatica, Volume 6, Issue 1, Pages 33–55, ISSN (Online) 2066-7760, DOI: https://doi.org/10.2478/ausi-2014-0017.

Export Citation

© 2014. This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. BY-NC-ND 3.0

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Robespierre Pita, Clicia Pinto, Samila Sena, Rosemeire Fiaccone, Leila Amorim, Sandra Reis, Mauricio L. Barreto, Spiros Denaxas, and Marcos Ennes Barreto
IEEE Journal of Biomedical and Health Informatics, 2018, Volume 22, Number 2, Page 346

Comments (0)

Please log in or register to comment.
Log in