Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Proceedings on Privacy Enhancing Technologies

4 Issues per year

Open Access
Online
ISSN
2299-0984
See all formats and pricing
More options …

An Automated Approach for Complementing Ad Blockers’ Blacklists

David Gugelmann / Markus Happe / Bernhard Ager / Vincent Lenders
Published Online: 2015-06-22 | DOI: https://doi.org/10.1515/popets-2015-0018

Abstract

Privacy in the Web has become a major concern resulting in the popular use of various tools for blocking tracking services. Most of these tools rely on manually maintained blacklists, which need to be kept up-to-date to protect Web users’ privacy efficiently. It is challenging to keep pace with today’s quickly evolving advertisement and analytics landscape. In order to support blacklist maintainers with this task, we identify a set of Web traffic features for identifying privacyintrusive services. Based on these features, we develop an automatic approach that learns the properties of advertisement and analytics services listed by existing blacklists and proposes new services for inclusion on blacklists. We evaluate our technique on real traffic traces of a campus network and find in the order of 200 new privacy-intrusive Web services that are not listed by the most popular Firefox plug-in Adblock Plus. The proposed Web traffic features are easy to derive, allowing a distributed implementation of our approach.

Keywords: Privacy; Web; tracking; advertisement; analytics; blacklist; HTTP; network measurement

References

  • [1] J. Abbatiello. RefControl – Firefox Add-on. https://addons.mozilla.org/de/firefox/addon/refcontrol. Accessed: 2015-02-14.

  • [2] G. Acar, C. Eubank, S. Englehardt, M. Juarez, A. Narayanan, and C. Diaz. The web never forgets: Persistent tracking mechanisms in the wild. In Proc. CCS ’14, pages 674–689, 2014.Google Scholar

  • [3] L. Andrews. Facebook Is Using You. New York Times (2012-02-04), http://www.nytimes.com/2012/02/05/opinion/sunday/facebook-is-using-you.html. Accessed: 2015-02-14.

  • [4] M. F. Arlitt and C. L. Williamson. Web server workload characterization: the search for invariants. In Proc. SIGMETRICS ’96, pages 126–137, 1996.Google Scholar

  • [5] P. Barford, A. Bestavros, A. Bradley, and M. Crovella. Changes in web client access patterns: Characteristics and caching implications. World Wide Web, 2(1-2):15–28, 1999.Google Scholar

  • [6] L. Bernaille, R. Teixeira, I. Akodkenou, A. Soule, and K. Salamatian. Traffic classification on the fly. SIGCOMM Comput. Commun. Rev., 36(2):23–26, Apr. 2006.Google Scholar

  • [7] C. M. Bishop. Pattern recognition and machine learning. Springer, 2006.Google Scholar

  • [8] M. Butkiewicz, H. V. Madhyastha, and V. Sekar. Understanding website complexity: Measurements, metrics, and implications. In Proc. IMC ’11, pages 313–328, 2011.Google Scholar

  • [9] R. Cookson. Google, Microsoft and Amazon pay to get around ad blocking tool. Financial Times (2015-02-01), http://www.ft.com/cms/s/0/80a8ce54-a61d-11e4-9bd3-00144feab7de.html. Accessed: 2015-02-15.

  • [10] M. E. Crovella and A. Bestavros. Self-similarity in world wide web traffic: evidence and possible causes. IEEE/ACM Trans. Netw., 5(6):835–846, 1997.Google Scholar

  • [11] J. Demšar, T. Curk, A. Erjavec, Črt Gorup, T. Hočevar, M. Milutinovič, M. Možina, M. Polajnar, M. Toplak, A. Starič, M. Štajdohar, L. Umek, L. Žagar, J. Žbontar, M. Žitnik, and B. Zupan. Orange: Data mining toolbox in python. Journal of Machine Learning Research, 14:2349–2353, 2013.Google Scholar

  • [12] Disconnect | Online Privacy & Security. https://disconnect.me/.

  • [13] P. Domingos and M. Pazzani. On the optimality of the simple bayesian classifier under zero-one loss. Machine learning, 29(2-3):103–130, 1997.Google Scholar

  • [14] F. Douglis, A. Feldmann, B. Krishnamurthy, and J. Mogul. Rate of change and other metrics: a live study of the world wide web. In Proc. USENIX Symp. on Internet Technologies and Systems, Dec. 1997.Google Scholar

  • [15] U. M. Fayyad and K. B. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. In Proc. 13th Int. Joint Conf. on Artificial Intelligence, pages 1022–1027, 1993.Google Scholar

  • [16] M. Fertik. The Rich See a Different Internet Than the Poor. Scientific American Volume 308, Issue 2, http://www.scientificamerican.com/article/rich-see-differentinternet-than-the-poor/. Accessed: 2015-02-14.

  • [17] R. Fielding and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. RFC 7231.Google Scholar

  • [18] P. Gill, V. Erramilli, A. Chaintreau, B. Krishnamurthy, K. Papagiannaki, and P. Rodriguez. Follow the money: Understanding economics of online aggregation and advertising. In Proc. IMC ’13, pages 141–148, 2013.Google Scholar

  • [19] D. Gugelmann, B. Ager, and V. Lenders. Towards classifying third-party web services at scale. In Proc. CoNEXT Student Workshop ’14, pages 34–36, 2014.Google Scholar

  • [20] T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning, volume 2. Springer, 2009.Google Scholar

  • [21] R. Hill. Comparative benchmarks against widely used blockers: Top 15 Most Popular News Websites. https://github.com/gorhill/httpswitchboard/wiki/Comparativebenchmarks-against-widely-used-blockers:-Top-15-Most-Popular-News-Websites. Accessed: 2015-02-13.

  • [22] S. Ihm and V. S. Pai. Towards understanding modern web traffic. In Proc. IMC ’11, pages 295–312, 2011.Google Scholar

  • [23] T. Karagiannis, K. Papagiannaki, and M. Faloutsos. Blinc: multilevel traffic classification in the dark. SIGCOMM Comput. Commun. Rev., 35(4):229–240, 2005.Google Scholar

  • [24] H. Kim, K. Claffy, M. Fomenkov, D. Barman, M. Faloutsos, and K. Lee. Internet traffic classification demystified: Myths, caveats, and the best practices. In Proc. ACM CoNEXT ’08, pages 11:1–11:12, 2008.Google Scholar

  • [25] B. Krishnamurthy. I know what you will do next summer. SIGCOMM Comput. Commun. Rev., 40(5):65–70, 2010.Google Scholar

  • [26] B. Krishnamurthy, D. Malandrino, and C. E. Wills. Measuring privacy loss and the impact of privacy protection in web browsing. In Proc. 3rd Symp. on Usable Privacy and Security (SOUPS ’07), pages 52–63, 2007.Google Scholar

  • [27] B. Krishnamurthy, K. Naryshkin, and C. E. Wills. Privacy leakage vs. protection measures: the growing disconnect. In Proc. Web 2.0 Security and Privacy Workshop, 2011.Google Scholar

  • [28] T. Libert. Privacy implications of health information seeking on the web. Commun. ACM, 58(3):68–77, 2015.Google Scholar

  • [29] J. Ma, L. K. Saul, S. Savage, and G. M. Voelker. Beyond blacklists: Learning to detect malicious web sites from suspicious urls. In Proc. 15th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD ’09, pages 1245–1254, 2009.Google Scholar

  • [30] X. Ma, J. Zhu, Z. Wan, J. Tao, X. Guan, and Q. Zheng. Honeynet-based collaborative defense using improved highly predictive blacklisting algorithm. In 8th World Congr. on Intelligent Control and Automation, WCICA ’10, pages 1283–1288, 2010.Google Scholar

  • [31] G. Maier, A. Feldmann, V. Paxson, and M. Allman. On dominant characteristics of residential broadband internet traffic. In Proc. IMC ’09, pages 90–102, 2009.Google Scholar

  • [32] J. R. Mayer and J. C. Mitchell. Third-party web tracking: Policy and technology. In Proc. SP ’12, pages 413–427, 2012.Google Scholar

  • [33] J. Mikians, L. Gyarmati, V. Erramilli, and N. Laoutaris. Detecting price and search discrimination on the internet. In Proc. HotNets-XI ’12, pages 79–84, 2012.Google Scholar

  • [34] Mozilla | Lightbeam for Firefox. https://www.mozilla.org/en-US/lightbeam/. Accessed: 2015-04-28.

  • [35] L. Olejnik, C. Castelluccia, and A. Janc. Why johnny can’t browse in peace: On the uniqueness of web browsing history patterns. In Proc. HotPETs ’12, 2012.Google Scholar

  • [36] L. Olejnik, T. Minh-Dung, and C. Castelluccia. Selling off privacy at auction. In Proc. NDSS ’14, 2014.Google Scholar

  • [37] H.-K. Pao, Y.-L. Chou, and Y.-J. Lee. Malicious url detection based on kolmogorov complexity estimation. In Proc. Int. Conf. on Web Intelligence and Intelligent Agent Technology, WI-IAT ’12, pages 380–387, 2012.Google Scholar

  • [38] V. Paxson. Bro: a system for detecting network intruders in real-time. Computer Networks, 31(23-24):2435–2463, 1999.Google Scholar

  • [39] D. Peck. They’re Watching You at Work. The Atlantic (2013-11-20), http://www.theatlantic.com/magazine/archive/2013/12/theyre-watching-you-at-work/354681/. Accessed: 2015-02-14.

  • [40] Adblock. https://getadblock.com.PubMed

  • [41] Adblock Plus. https://adblockplus.org.

  • [42] Ghostery. https://www.ghostery.com.

  • [43] NoScript. https://noscript.net.PubMed

  • [44] P. Prakash, M. Kumar, R. R. Kompella, and M. Gupta. Phishnet: Predictive blacklisting to detect phishing attacks. In Proc. INFOCOM ’10, pages 1–5, 2010.Google Scholar

  • [45] R. Pries, Z. Magyari, and P. Tran-Gia. An http web traffic model based on the top one million visited web pages. In Proc. EURO-NGI Conf. Next Generation Internet (NGI), pages 133–139, 2012.Google Scholar

  • [46] Electronic Frontier Foundation | Privacy Badger. https://www.eff.org/de/node/73969. Accessed: 2015-02-13.

  • [47] J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., 1993.Google Scholar

  • [48] F. Roesner, T. Kohno, and D. Wetherall. Detecting and defending against third-party tracking on the web. In Proc. NSDI ’12, 2012.Google Scholar

  • [49] L. Salgarelli, F. Gringoli, and T. Karagiannis. Comparing traffic classifiers. SIGCOMM Comput. Commun. Rev., 37(3):65–68, 2007.Google Scholar

  • [50] L. Scism and M. Maremont. Insurers Test Data Profiles to Identify Risky Clients. Wall Street Journal (2010-11-19), http://www.wsj.com/articles/SB10001424052748704648604575620750998072986. Accessed: 2015-02-14.

  • [51] F. Soldo, A. Le, and A. Markopoulou. Blacklisting recommendation system: Using spatio-temporal patterns to predict future attacks. J. on Selected Areas in Commun., 29(7):1423–1437, 2011.Google Scholar

  • [52] Tcpdump/Libpcap. http://www.tcpdump.org.

  • [53] Tor | Anonymity Online. https://www.torproject.org.

  • [54] M. Tran, X. Dong, Z. Liang, and X. Jiang. Tracking the trackers: Fast and scalable dynamic analysis of web content for privacy violations. In Proc. Conf. on Applied Cryptography and Network Security, ACNS ’12, pages 418–435, 2012.Google Scholar

  • [55] H. Zhang. The Optimality of Naive Bayes. In Proc. FLAIRS ’04, 2004.Google Scholar

  • [56] J. Zhang, P. A. Porras, and J. Ullrich. Highly predictive blacklisting. In Proc. USENIX Security ’08, 2008.Google Scholar

About the article

Received: 2015-02-15

Revised: 2015-05-15

Accepted: 2015-05-15

Published Online: 2015-06-22

Published in Print: 2015-06-01


Citation Information: Proceedings on Privacy Enhancing Technologies, ISSN (Online) 2299-0984, DOI: https://doi.org/10.1515/popets-2015-0018.

Export Citation

© David Gugelmann et al.. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. BY-NC-ND 3.0

Comments (0)

Please log in or register to comment.
Log in