Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Proceedings on Privacy Enhancing Technologies

4 Issues per year

Open Access
See all formats and pricing
More options …

SafePub: A Truthful Data Anonymization Algorithm With Strong Privacy Guarantees

Raffael Bild / Klaus A. Kuhn / Fabian Prasser
Published Online: 2018-01-11 | DOI: https://doi.org/10.1515/popets-2018-0004


Methods for privacy-preserving data publishing and analysis trade off privacy risks for individuals against the quality of output data. In this article, we present a data publishing algorithm that satisfies the differential privacy model. The transformations performed are truthful, which means that the algorithm does not perturb input data or generate synthetic output data. Instead, records are randomly drawn from the input dataset and the uniqueness of their features is reduced. This also offers an intuitive notion of privacy protection. Moreover, the approach is generic, as it can be parameterized with different objective functions to optimize its output towards different applications. We show this by integrating six well-known data quality models. We present an extensive analytical and experimental evaluation and a comparison with prior work. The results show that our algorithm is the first practical implementation of the described approach and that it can be used with reasonable privacy parameters resulting in high degrees of protection. Moreover, when parameterizing the generic method with an objective function quantifying the suitability of data for building statistical classifiers, we measured prediction accuracies that compare very well with results obtained using state-of-the-art differentially private classification algorithms.

Keywords: Data privacy; differential privacy; anonymization; disclosure control; classification


  • [1] A. Machanavajjhala et al. l-diversity: Privacy beyond kanonymity. Transactions on Knowledge Discovery from Data, 1(1):3, 2007.Google Scholar

  • [2] B. C. M. Fung et al. Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques. CRC Press, 2010.Google Scholar

  • [3] R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In International Conference on Data Engineering, pages 217–228, 2005.Google Scholar

  • [4] J. Brickell and V. Shmatikov. The cost of privacy: Destruction of data-mining utility in anonymized data publishing. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 70–78, 2008.Google Scholar

  • [5] C. Clifton and T. Tassa. On syntactic anonymity and differential privacy. In International Conference on Data Engineering Workshops, pages 88–93, 2013.Google Scholar

  • [6] F. K. Dankar and K. El Emam. Practicing differential privacy in health care: A review. Transactions on Data Privacy, 6(1):35–67, 2013.Google Scholar

  • [7] T. de Waal and L. Willenborg. Information loss through global recoding and local suppression. Netherlands Official Statistics, 14:17–20, 1999.Google Scholar

  • [8] J. Domingo-Ferrer and J. Soria-Comas. From t-closeness to differential privacy and vice versa in data anonymization. Knowledge-Based Systems, 74:151–158, 2015.Google Scholar

  • [9] C. Dwork. An ad omnia approach to defining and achieving private data analysis. In International Conference on Privacy, Security, and Trust in KDD, pages 1–13, 2008.Google Scholar

  • [10] C. Dwork. Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation, pages 1–19, 2008.Google Scholar

  • [11] K. El Emam and L. Arbuckle. Anonymizing Health Data. O’Reilly Media, 2013.Google Scholar

  • [12] K. El Emam and F. K. Dankar. Protecting privacy using k-anonymity. Jama-J Am. Med. Assoc., 15(5):627–637, 2008.CrossrefGoogle Scholar

  • [13] K. El Emam and B. Malin. Appendix b: Concepts and methods for de-identifying clinical trial data. In Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk, pages 1–290. National Academies Press (US), 2015.Google Scholar

  • [14] European Medicines Agency. External guidance on the implementation of the european medicines agency policy on the publication of clinical data for medicinal products for human use. EMA/90915/2016, 2016.Google Scholar

  • [15] F. Prasser et al. Lightning: Utility-driven anonymization of high-dimensional data. Transactions on Data Privacy, 9(2):161–185, 2016.Google Scholar

  • [16] F. Prasser et al. A tool for optimizing de-identified health data for use in statistical classification. In IEEE International Symposium on Computer-Based Medical Systems, 2017.Google Scholar

  • [17] L. Fan and H. Jin. A practical framework for privacy-preserving data analytics. In International Conference on World Wide Web, pages 311–321, 2015.Google Scholar

  • [18] M. R. Fouad, K. Elbassioni, and E. Bertino. A supermodularity-based differential privacy preserving algorithm for data anonymization. IEEE Transactions on Knowledge and Data Engineering, 26(7):1591–1601, 2014.Google Scholar

  • [19] A. Friedman and A. Schuster. Data mining with differential privacy. In International Conference on Knowledge Discovery and Data Mining, pages 493–502, 2010.Google Scholar

  • [20] G. Cormode et al. Empirical privacy and empirical utility of anonymized data. In IEEE International Conference on Data Engineering Workshops, pages 77–82, 2013.Google Scholar

  • [21] G. Poulis et al. Secreta: a system for evaluating and comparing relational and transaction anonymization algorithms. In International Conference on Extending Database Technology, pages 620–623, 2014.Google Scholar

  • [22] J. Gehrke, E. Lui, and R. Pass. Towards privacy for social networks: A zero-knowledge based definition of privacy. In Theory of Cryptography Conference, pages 432–449, 2011.Google Scholar

  • [23] R. L. Graham, D. E. Knuth, and O. Patashnik. Concrete Mathematics: A Foundation for Computer Science. Addison-Wesley publishing company, 2nd edition, 1994.Google Scholar

  • [24] Y. Hong, J. Vaidya, H. Lu, and M. Wu. Differentially private search log sanitization with optimal output utility. In International Conference on Extending Database Technology, pages 50–61, 2012.Google Scholar

  • [25] V. S. Iyengar. Transforming data to satisfy privacy constraints. In International Conference on Knowledge Discovery and Data Mining, pages 279–288, 2002.Google Scholar

  • [26] J. Gehrke et al. Crowd-blending privacy. In Advances in Cryptology, pages 479–496. Springer, 2012.Google Scholar

  • [27] J. Soria-Comas et al. Enhancing data utility in differential privacy via microaggregation-based k-anonymity. VLDB J., 23(5):771–794, 2014.CrossrefGoogle Scholar

  • [28] J. Soria-Comas et al. t-closeness through microaggregation: Strict privacy with enhanced utility preservation. IEEE Transactions on Knowledge and Data Engineering, 27(11):3098–3110, 2015.Google Scholar

  • [29] J. Vaidya et al. Differentially private naive bayes classification. In IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies, pages 571–576, 2013.Google Scholar

  • [30] Y. Jafer, S. Matwin, and M. Sokolova. Using feature selection to improve the utility of differentially private data publishing. Procedia Computer Science, 37:511–516, 2014.Google Scholar

  • [31] Z. Ji, Z. C. Lipton, and C. Elkan. Differential privacy and machine learning: a survey and review. CoRR, abs/1412.7584, 2014.Google Scholar

  • [32] Z. Jorgensen, T. Yu, and G. Cormode. Conservative or liberal? personalized differential privacy. In IEEE International Conference on Data Engineering, pages 1023–1034, April 2015.Google Scholar

  • [33] K. El Emam et al. A globally optimal k-anonymity method for the de-identification of health data. J. Am. Med. Inform. Assn., 16(5):670–682, 2009.CrossrefGoogle Scholar

  • [34] F. Kohlmayer, F. Prasser, C. Eckert, A. Kemper, and K. A. Kuhn. Flash: efficient, stable and optimal k-anonymity. In 2012 International Conference on Privacy, Security, Risk and Trust (PASSAT) and 2012 International Conference on Social Computing (SocialCom), pages 708–717, 2012.Google Scholar

  • [35] K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: Efficient full-domain k-anonymity. In International Conference on Management of Data, pages 49–60, 2005.Google Scholar

  • [36] K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In International Conference on Data Engineering, pages 25–25, 2006.Google Scholar

  • [37] K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workloadaware anonymization techniques for large-scale datasets. ACM Transactions on Database Systems, 33(3):1–47, 2008.CrossrefGoogle Scholar

  • [38] D. Leoni. Non-interactive differential privacy: A survey. In International Workshop on Open Data, pages 40–52, 2012.Google Scholar

  • [39] N. Li, W. Qardaji, and D. Su. On sampling, anonymization, and differential privacy: Or, k-anonymization meets differential privacy. In ACM Symposium on Information, Computer and Communications Security, pages 32–33, 2012.Google Scholar

  • [40] N. Li, W. H. Qardaji, and D. Su. Provably private data anonymization: Or, k-anonymity meets differential privacy. CoRR, abs/1101.2604, 2011.Google Scholar

  • [41] T. Li and N. Li. On the tradeoff between privacy and utility in data publishing. In International Conference on Knowledge Discovery and Data Mining, pages 517–526, 2009.Google Scholar

  • [42] M. Lichman. UCI machine learning repository. http://archive.ics.uci.edu/ml, 2013.

  • [43] M. R. Fouad, K. Elbassioni, and E. Bertino. Towards a differentially private data anonymization. CERIAS Tech Report 2012-1, Purdue Univ., 2012.Google Scholar

  • [44] F. McSherry and K. Talwar. Mechanism design via differential privacy. In IEEE Symposium on Foundations of Computer Science, pages 94–103, 2007.Google Scholar

  • [45] F. D. McSherry. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In International Conference on Management of Data, pages 19–30, 2009.Google Scholar

  • [46] N. Mohammed et al. Differentially private data release for data mining. In International Conference on Knowledge Discovery and Data Mining, pages 493–501, 2011.Google Scholar

  • [47] M. E. Nergiz, M. Atzori, and C. Clifton. Hiding the presence of individuals from shared databases. In International Conference on Management of Data, pages 665–676, 2007.Google Scholar

  • [48] F. Prasser, F. Kohlmayer, and K. A. Kuhn. The importance of context: Risk-based de-identification of biomedical data. Methods of information in medicine, 55(4):347–355, 2016.Google Scholar

  • [49] J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., 1993.Google Scholar

  • [50] F. Ritchie and M. Elliott. Principles-versus rules-based output statistical disclosure control in remote access environments. IASSIST Quarterly, 39(2):5–13, 2015.Google Scholar

  • [51] A. D. Sarwate and K. Chaudhuri. Signal processing and machine learning with differential privacy: Algorithms and challenges for continuous data. IEEE Signal Processing Magazine, 30(5):86–94, 2013.CrossrefGoogle Scholar

  • [52] L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):571–588, Oct. 2002.CrossrefGoogle Scholar

  • [53] L. Willenborg and T. De Waal. Statistical disclosure control in practice. Springer Science & Business Media, 1996.Google Scholar

  • [54] I. H. Witten and F. Eibe. Data mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2005.Google Scholar

  • [55] X. Jiang et al. Differential-private data publishing through component analysis. Transactions on Data Privacy, 6(1):19–34, Apr. 2013.Google Scholar

  • [56] Z. Wan et al. A game theoretic framework for analyzing re-identification risk. PloS one, 10(3):e0120592, 2015.Google Scholar

  • [57] N. Zhang, M. Li, and W. Lou. Distributed data mining with differential privacy. In IEEE International Conference on Communications, pages 1–5, 2011.Google Scholar

About the article

Received: 2017-05-31

Revised: 2017-09-15

Accepted: 2017-09-16

Published Online: 2018-01-11

Published in Print: 2018-01-01

Citation Information: Proceedings on Privacy Enhancing Technologies, Volume 2018, Issue 1, Pages 67–87, ISSN (Online) 2299-0984, DOI: https://doi.org/10.1515/popets-2018-0004.

Export Citation

© 2018 Raffael Bild et al., published by De Gruyter Open. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. BY-NC-ND 3.0

Comments (0)

Please log in or register to comment.
Log in