Abstract
Detecting and fending off attacks on computer systems is an enduring problem in computer security. In light of a plethora of different threats and the growing automation used by attackers, we are in urgent need of more advanced methods for attack detection. Manually crafting detection rules is by no means feasible at scale, and automatically generated signatures often lack context, such that they fall short in detecting slight variations of known threats.
In the thesis “Efficient Machine Learning for Attack Detection” [35], we address the necessity of advanced attack detection. For the effective application of machine learning in this domain, a periodic retraining over time is crucial. We show that with the right data representation, efficient algorithms for mining substring statistics, and implementations based on probabilistic data structures, training the underlying model for establishing an higher degree of automation for defenses can be achieved in linear time.
About the author

Prof. Dr. Christian Wressnegger is an Assistant Professor at Karlsruhe Institute of Technology (KIT), where he leads the “Intelligent System Security” research group. He graduated from Technische Universität Graz with a Masters degree in computer science in 2008 and received a Doctorate from Technische Universität Braunschweig in 2018. Christian Wressnegger has been a runner-up of the CAST/GI Dissertation Award for IT-Security 2019 and recipient of the German Prize for IT-Security 2016. His research interests revolve around the combination of computer security with machine learning, such as the detection and prevention of attacks, vulnerability discovery, and the explainability of learning-based systems.
References
1. C. Aggarwal. A framework for clustering massive-domain data streams. In Proc. of the International Conference on Data Engineering (ICDE), pages 102–113, 2009.10.1109/ICDE.2009.13Search in Google Scholar
2. D. Arp, M. Spreitzenbarth, M. Hübner, H. Gascon, and K. Rieck. Drebin: Efficient and explainable detection of Android malware in your pocket. In Proc. of the Network and Distributed System Security Symposium (NDSS), 2014.10.14722/ndss.2014.23247Search in Google Scholar
3. U. Bayer, P. M. Comparetti, C. Hlauschek, C. Kruegel, and E. Kirda. Scalable, behavior-based malware clustering. In Proc. of the Network and Distributed System Security Symposium (NDSS), 2009.Search in Google Scholar
4. B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communication of the ACM, 13 (7): 422–426, 1970.10.1145/362686.362692Search in Google Scholar
5. W. B. Cavnar and J. M. Trenkle. N-gram-based text categorization. In Proc. of the Symposium on Document Analysis and Information Retrieval, pages 161–175, 1994.Search in Google Scholar
6. A. Cherepanov. Win32/industroyer – a new threat for industrial control systems. Technical report, ESET, 2017.Search in Google Scholar
7. Z. L. Chua, S. Shen, P. Saxena, and Z. Liang. Neural nets can learn function type signatures from binaries. In Proc. of the USENIX Security Symposium, pages 99–116, 2017.Search in Google Scholar
8. G. Cormode and S. Muthukrishnan. Approximating data with the count-min sketch. Journal of IEEE Software, 29 (1): 64–69, 2012.10.1109/MS.2011.127Search in Google Scholar
9. C. Feng, T. Li, and D. Chana. Multi-level anomaly detection in industrial control systems via package signatures and LSTM networks. In Proc. of the Conference on Dependable Systems and Networks (DSN), pages 261–272, 2017.10.1109/DSN.2017.34Search in Google Scholar
10. S. Forrest, S. A. Hofmeyr, A. Somayaji, and T. A. Longstaff. A sense of self for unix processes. In Proc. of the IEEE Symposium on Security and Privacy, pages 120–128, 1996.Search in Google Scholar
11. J. Franklin, V. Paxson, A. Perrig, and S. Savage. An Inquiry Into the Nature and Causes of the Wealth of Internet Miscreants. In Proc. of the ACM Conference on Computer and Communications Security (CCS), pages 375–388, 2007.Search in Google Scholar
12. D. Hadžiosmanović, L. Simionato, D. Bolzoni, E. Zambon, and S. Etalle. N-gram against the machine: On the feasibility of the n-gram network analysis for binary protocols. In Proc. of the International Symposium on Research in Attacks, Intrusions and Defenses (RAID), pages 354–373, 2012.10.1007/978-3-642-33338-5_18Search in Google Scholar
13. A. Kapravelos, Y. Shoshitaishvili, M. Cova, C. Kruegel, and G. Vigna. Revolver: An automated approach to the detection of evasive web-based malware. In Proc. of the USENIX Security Symposium, pages 637–651, 2013.Search in Google Scholar
14. S. Karnouskos. Stuxnet worm impact on industrial cyber-physical system security. In Proc. of Annual Conference on IEEE Industrial Electronics Society (IECON), pages 4490–4494, 2011.10.1109/IECON.2011.6120048Search in Google Scholar
15. R. K. Konoth, E. Vineti, V. Moonsamy, M. Lindorfer, C. Kruegel, H. Bos, and G. Vigna. An in-depth look into drive-by mining and its defense. In Proc. of the ACM Conference on Computer and Communications Security (CCS), 2018.Search in Google Scholar
16. P. Laskov and N. Šrndić. Static detection of malicious JavaScript-bearing PDF documents. In Proc. of the Annual Computer Security Applications Conference (ACSAC), pages 373–382, 2011.10.1145/2076732.2076785Search in Google Scholar
17. A. Maier, H. Gascon, C. Wressnegger, and K. Rieck. TypeMiner: Recovering types in binary programs using machine learning. In Proc. of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), pages 288–308, 2019.10.1007/978-3-030-22038-9_14Search in Google Scholar
18. A. Moser, C. Kruegel, and E. Kirda. Limits of static analysis for malware detection. In Proc. of the Annual Computer Security Applications Conference (ACSAC), pages 421–430, 2007.10.1109/ACSAC.2007.21Search in Google Scholar
19. M. Musch, C. Wressnegger, M. Johns, and K. Rieck. New kid on the web: A study on the prevalence of WebAssembly in the wild. In Proc. of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), pages 23–42, 2019.10.1007/978-3-030-22038-9_2Search in Google Scholar
20. G. Pellegrino, M. Johns, S. Koch, M. Backes, and C. Rossow. Deemon: Detecting CSRF with dynamic analysis and property graphs. In Proc. of the ACM Conference on Computer and Communications Security (CCS), pages 1757–1771, 2017.10.1145/3133956.3133959Search in Google Scholar
21. R. Perdisci, D. Ariu, P. Fogla, G. Giacinto, and W. Lee. McPAD: A multiple classifier system for accurate payload-based anomaly detection. Computer Networks, 5 (6): 864–881, 2009.10.1016/j.comnet.2008.11.011Search in Google Scholar
22. W. Pugh. Skip Lists: A probabilistic alternative to balanced trees. Communications of the ACM, 33 (6): 668–676, 1990.10.1007/3-540-51542-9_36Search in Google Scholar
23. K. Rieck, T. Krueger, and A. Dewald. Cujo: Efficient detection and prevention of drive-by-download attacks. In Proc. of the Annual Computer Security Applications Conference (ACSAC), pages 31–39, 2010.10.1145/1920261.1920267Search in Google Scholar
24. G. Salton, A. Wong, and C.-S. Yang. A vector space model for automatic indexing. Communications of the ACM, 18 (11): 613–620, 1975.10.1145/361219.361220Search in Google Scholar
25. T. Schreck, S. Berger, and J. Göbel. BISSAM: Automatic vulnerability identification of office documents. In Proc. of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), 2012.10.1007/978-3-642-37300-8_12Search in Google Scholar
26. R. Seidel and C. R. Aragon. Randomized search trees. Algorithmica, 16 (4): 464–497, 1996.10.1007/BF01940876Search in Google Scholar
27. Y. Shen, E. Mariconti, P.-A. Vervier, and G. Stringhini. Tiresias: Predicting security events through deep learning. In Proc. of the ACM Conference on Computer and Communications Security (CCS), 2018.10.1145/3243734.3243811Search in Google Scholar
28. Q. Shi, J. Petterson, G. Dror, J. C. Langford, A. Smola, and S. Vishwanathan. Hash kernels for structured data. In Journal of Machine Learning Research (JMLR), pages 1113–1120, 2009.Search in Google Scholar
29. B. Stock, S. Pfistner, B. Kaiser, S. Lekies, and M. Johns. From facepalm to brain bender: Exploring client-side cross-site scripting. In Proc. of the ACM Conference on Computer and Communications Security (CCS), pages 1419–1430, 2015.10.1145/2810103.2813625Search in Google Scholar
30. C. Y. Suen. N-gram statistics for natural language understanding and text processing. IEEE Trans. Pattern Analysis and Machine Intelligence, 1 (2): 164–172, 1979.10.1109/TPAMI.1979.4766902Search in Google Scholar PubMed
31. N. Šrndić and P. Laskov. Detection of malicious PDF files based on hierarchical document structure. In Proc. of the Network and Distributed System Security Symposium (NDSS), 2013.Search in Google Scholar
32. K. Wang and S. J. Stolfo. Anomalous payload-based network intrusion detection. In Proc. of the International Symposium on Recent Advances in Intrusion Detection (RAID), pages 203–222, 2004.10.1007/978-3-540-30143-1_11Search in Google Scholar
33. K. Wang, J. J. Parekh, and S. J. Stolfo. Anagram: A content anomaly detector resistant to mimicry attack. In Proc. of the International Symposium on Recent Advances in Intrusion Detection (RAID), pages 226–248, 2006.10.1007/11856214_12Search in Google Scholar
34. K. Weinberger, A. Dasgupta, J. C. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In Proc. of the International Conference on Machine Learning (ICML), pages 1113–1120, 2009.10.1145/1553374.1553516Search in Google Scholar
35. C. Wressnegger. Efficient Machine Learning for Attack Detection. Dissertation, Technische Universität Braunschweig, 2018.Search in Google Scholar
36. C. Wressnegger and K. Rieck. Looking back on three years of flash-based malware. In Proc. of the ACM European Workshop on Systems Security (EuroSec), Apr. 2017.10.1145/3065913.3065921Search in Google Scholar
37. C. Wressnegger, F. Boldewin, and K. Rieck. Deobfuscating embedded malware using probable-plaintext attacks. In Proc. of the Symposium on Research in Attacks, Intrusions, and Defenses (RAID), pages 164–183, 2013.10.1007/978-3-642-41284-4_9Search in Google Scholar
38. C. Wressnegger, G. Schwenk, D. Arp, and K. Rieck. A close look on n-grams in intrusion detection: Anomaly detection vs. classification. In Proc. of the ACM Workshop on Artificial Intelligence and Security (AISEC), pages 67–76, 2013.Search in Google Scholar
39. C. Wressnegger, F. Yamaguchi, D. Arp, and K. Rieck. Comprehensive analysis and detection of flash-based malware. In Proc. of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), pages 101–121, 2016.10.1007/978-3-319-40667-1_6Search in Google Scholar
40. C. Wressnegger, F. Yamaguchi, A. Maier, and K. Rieck. Twice the bits, twice the trouble: Vulnerabilities induced by migrating to 64-bit platforms. In Proc. of the ACM Conference on Computer and Communications Security (CCS), pages 541–552, 2016.10.1145/2976749.2978403Search in Google Scholar
41. C. Wressnegger, K. Freeman, F. Yamaguchi, and K. Rieck. Automatically inferring malware signatures for anti-virus assisted attacks. In Proc. of the ACM Asia Conference on Computer and Communications Security (ASIACCS), pages 587–598, Apr. 2017.10.1145/3052973.3053002Search in Google Scholar
42. C. Wressnegger, F. Yamaguchi, A. Maier, and K. Rieck. 64-bit migration vulnerabilities. Information Technology (IT), 59 (2): 73–82, Apr. 2017.10.1515/itit-2016-0041Search in Google Scholar
43. C. Wressnegger, A. Kellner, and K. Rieck. ZOE: Content-based anomaly detection for industrial control systems. In Proc. of the Conference on Dependable Systems and Networks (DSN), pages 127–138, June 2018.10.1109/DSN.2018.00025Search in Google Scholar
44. A. Young and M. Yung. Cryptovirology: The birth, neglect, and explosion of ransomware. Communications of the ACM, 60 (7): 24–26, 2017.10.1145/3097347Search in Google Scholar
© 2020 Walter de Gruyter GmbH, Berlin/Boston