Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Proceedings on Privacy Enhancing Technologies

4 Issues per year

Open Access
Online
ISSN
2299-0984
See all formats and pricing
More options …

On the (In)effectiveness of Mosaicing and Blurring as Tools for Document Redaction

Steven Hill / Zhimin Zhou / Lawrence Saul / Hovav Shacham
Published Online: 2016-07-14 | DOI: https://doi.org/10.1515/popets-2016-0047

Abstract

In many online communities, it is the norm to redact names and other sensitive text from posted screenshots. Sometimes solid bars are used; sometimes a blur or other image transform is used. We consider the effectiveness of two popular image transforms - mosaicing (also known as pixelization) and blurring - for redaction of text. Our main finding is that we can use a simple but powerful class of statistical models - so-called hidden Markov models (HMMs) - to recover both short and indefinitely long instances of redacted text. Our approach borrows on the success of HMMs for automatic speech recognition, where they are used to recover sequences of phonemes from utterances of speech. Here we use HMMs in an analogous way to recover sequences of characters from images of redacted text. We evaluate an implementation of our system against multiple typefaces, font sizes, grid sizes, pixel offsets, and levels of noise. We also decode numerous real-world examples of redacted text. We conclude that mosaicing and blurring, despite their widespread usage, are not viable approaches for text redaction.

Keywords: redaction; mosaic; pixelation; blur; hidden markov models

References

  • [1] Cavedon, L., Foschini, L., and Vigna, G. Getting the face behind the squares: Reconstructing pixelized video streams. In WOOT (2011), pp. 37-45.Google Scholar

  • [2] Chen, F., and Ma, J. An empirical identification method of gaussian blur parameter for image deblurring. Signal Processing, IEEE Transactions on 57, 7 (2009), 2467-2478.Google Scholar

  • [3] Chen, X., Yang, J. and Wu, Q. Image deblur in gradient domain Optical Engineering, Optical Engineering 49, 11 (2010), 117003-117003.Google Scholar

  • [4] Dufaux, F. Video scrambling for privacy protection in video surveillance: recent results and validation framework. In SPIE Defense, Security, and Sensing (2011), International Society for Optics and Photonics, pp. 806302-806302.Google Scholar

  • [5] Eddy, S. What is a hidden markov model? Nature biotechnology 22, 10 (2004), 1315-1316.Google Scholar

  • [6] Ford, R., and Mayron, L. M. All Your Base Are Belong to US. In Proceedings of NSPW 2012 (2012), ACM, pp. 105-14.Web of ScienceGoogle Scholar

  • [7] Ho, N. Z.-Y., and Chang, E.-C. Residual information of redacted images hidden in the compression artifacts. In Information Hiding (2008), Springer, pp. 87-101.Google Scholar

  • [8] Hu, J., Brown, M. K., and Turin, W. Hmm based online handwriting recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on 18, 10 (1996), 1039-1045.Google Scholar

  • [9] Lopresti, D., and Spitz, A. L. Quantifying information leakage in document redaction. In Proceedings of the 1st ACM workshop on Hardcopy document processing (2004), ACM, pp. 63-69.Google Scholar

  • [10] Lopresti, D. P., and Spitz, A. L. Information leakage through document redaction: attacks and countermeasures. In Electronic Imaging 2005 (2005), International Society for Optics and Photonics, pp. 183-190.Google Scholar

  • [11] MacQueen, J., et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (1967), vol. 1, Oakland, CA, USA., pp. 281-297.Google Scholar

  • [12] Mancas-Thillou, C., and Mirmehdi, M. An introduction to super-resolution text. In Digital Document Processing. Springer, 2007, pp. 305-327.Google Scholar

  • [13] Marelli, M., Menini, S., Baroni, M., Bentivogli, L., Bernardi, R., and Zamparelli, R. A sick cure for the evaluation of compositional distributional semantic models. In Proceedings of the Ninth International Language Resources and Evaluation Conference (2014), pp. 216-223.Google Scholar

  • [14] Naccache, D., and Whelan, C. 9/11: Who alerted the cia?(and other secret secrets). Rump session, Eurocrypt (2004).Google Scholar

  • [15] Newton, E. M., Sweeney, L., and Malin, B. Preserving privacy by de-identifying face images. Knowledge and Data Engineering, IEEE Transactions on 17, 2 (2005), 232-243.Google Scholar

  • [16] Nizza, M. Interpol untwirls a suspected pedophile. http://thelede.blogs.nytimes.com/2007/10/08/interpol-untwirlsa-suspected-pedophile/, 2007.Google Scholar

  • [17] Padilla-López, J. R., Chaaraoui, A. A., and Flórez- Revuelta, F. Visual privacy protection methods: A survey. Expert Systems with Applications 42, 9 (2015), 4177-4195.Google Scholar

  • [18] Rabiner, L. R. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 2 (1989), 257-286.Google Scholar

  • [19] Venkatraman, D. Why blurring sensitive information is a bad idea. https://dheera.net/projects/blur, 2014.Google Scholar

  • [20] White, A. M., Matthews, A. R., Snow, K. Z., and Monrose, F. Phonotactic reconstruction of encrypted voip conversations: Hookt on fon-iks. In Security and Privacy (SP), 2011 IEEE Symposium on (2011), IEEE, pp. 3-18.Google Scholar

  • [21] Zhuang, L., Zhou, F., and Tygar, J. D. Keyboard acoustic emanations revisited. ACM Transactions on Information and System Security (TISSEC) 13, 1 (2009), 3.Web of ScienceGoogle Scholar

  • [22] Vanhoef, M., Piessens, F. All your biases belong to us: Breaking RC4 in WPA-TKIP and TLS. In 24th USENIX Security Symposium (USENIX Security 15), 2015Google Scholar

  • [23] Bricout, R., Murphy, S., Paterson, K., and Merwe, T. Analysing and Exploiting the Mantin Biases in RC4. In Cryptology ePrint Archive, Report 2016/063, 2016Google Scholar

About the article

Received: 2016-02-29

Revised: 2016-06-02

Accepted: 2016-06-02

Published Online: 2016-07-14

Published in Print: 2016-10-01


Citation Information: Proceedings on Privacy Enhancing Technologies, Volume 2016, Issue 4, Pages 403–417, ISSN (Online) 2299-0984, DOI: https://doi.org/10.1515/popets-2016-0047.

Export Citation

© 2016. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0

Comments (0)

Please log in or register to comment.
Log in