Accessible Unlicensed Requires Authentication Published by De Gruyter Oldenbourg November 10, 2020

Efficient machine learning for attack detection

Christian Wressnegger

Abstract

Detecting and fending off attacks on computer systems is an enduring problem in computer security. In light of a plethora of different threats and the growing automation used by attackers, we are in urgent need of more advanced methods for attack detection. Manually crafting detection rules is by no means feasible at scale, and automatically generated signatures often lack context, such that they fall short in detecting slight variations of known threats.

In the thesis “Efficient Machine Learning for Attack Detection” [35], we address the necessity of advanced attack detection. For the effective application of machine learning in this domain, a periodic retraining over time is crucial. We show that with the right data representation, efficient algorithms for mining substring statistics, and implementations based on probabilistic data structures, training the underlying model for establishing an higher degree of automation for defenses can be achieved in linear time.

ACM CCS:

References

1. C. Aggarwal. A framework for clustering massive-domain data streams. In Proc. of the International Conference on Data Engineering (ICDE), pages 102–113, 2009.Search in Google Scholar

2. D. Arp, M. Spreitzenbarth, M. Hübner, H. Gascon, and K. Rieck. Drebin: Efficient and explainable detection of Android malware in your pocket. In Proc. of the Network and Distributed System Security Symposium (NDSS), 2014.Search in Google Scholar

3. U. Bayer, P. M. Comparetti, C. Hlauschek, C. Kruegel, and E. Kirda. Scalable, behavior-based malware clustering. In Proc. of the Network and Distributed System Security Symposium (NDSS), 2009.Search in Google Scholar

4. B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communication of the ACM, 13 (7): 422–426, 1970.Search in Google Scholar

5. W. B. Cavnar and J. M. Trenkle. N-gram-based text categorization. In Proc. of the Symposium on Document Analysis and Information Retrieval, pages 161–175, 1994.Search in Google Scholar

6. A. Cherepanov. Win32/industroyer – a new threat for industrial control systems. Technical report, ESET, 2017.Search in Google Scholar

7. Z. L. Chua, S. Shen, P. Saxena, and Z. Liang. Neural nets can learn function type signatures from binaries. In Proc. of the USENIX Security Symposium, pages 99–116, 2017.Search in Google Scholar

8. G. Cormode and S. Muthukrishnan. Approximating data with the count-min sketch. Journal of IEEE Software, 29 (1): 64–69, 2012.Search in Google Scholar

9. C. Feng, T. Li, and D. Chana. Multi-level anomaly detection in industrial control systems via package signatures and LSTM networks. In Proc. of the Conference on Dependable Systems and Networks (DSN), pages 261–272, 2017.Search in Google Scholar

10. S. Forrest, S. A. Hofmeyr, A. Somayaji, and T. A. Longstaff. A sense of self for unix processes. In Proc. of the IEEE Symposium on Security and Privacy, pages 120–128, 1996.Search in Google Scholar

11. J. Franklin, V. Paxson, A. Perrig, and S. Savage. An Inquiry Into the Nature and Causes of the Wealth of Internet Miscreants. In Proc. of the ACM Conference on Computer and Communications Security (CCS), pages 375–388, 2007.Search in Google Scholar

12. D. Hadžiosmanović, L. Simionato, D. Bolzoni, E. Zambon, and S. Etalle. N-gram against the machine: On the feasibility of the n-gram network analysis for binary protocols. In Proc. of the International Symposium on Research in Attacks, Intrusions and Defenses (RAID), pages 354–373, 2012.Search in Google Scholar

13. A. Kapravelos, Y. Shoshitaishvili, M. Cova, C. Kruegel, and G. Vigna. Revolver: An automated approach to the detection of evasive web-based malware. In Proc. of the USENIX Security Symposium, pages 637–651, 2013.Search in Google Scholar

14. S. Karnouskos. Stuxnet worm impact on industrial cyber-physical system security. In Proc. of Annual Conference on IEEE Industrial Electronics Society (IECON), pages 4490–4494, 2011.Search in Google Scholar

15. R. K. Konoth, E. Vineti, V. Moonsamy, M. Lindorfer, C. Kruegel, H. Bos, and G. Vigna. An in-depth look into drive-by mining and its defense. In Proc. of the ACM Conference on Computer and Communications Security (CCS), 2018.Search in Google Scholar

16. P. Laskov and N. Šrndić. Static detection of malicious JavaScript-bearing PDF documents. In Proc. of the Annual Computer Security Applications Conference (ACSAC), pages 373–382, 2011.Search in Google Scholar

17. A. Maier, H. Gascon, C. Wressnegger, and K. Rieck. TypeMiner: Recovering types in binary programs using machine learning. In Proc. of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), pages 288–308, 2019.Search in Google Scholar

18. A. Moser, C. Kruegel, and E. Kirda. Limits of static analysis for malware detection. In Proc. of the Annual Computer Security Applications Conference (ACSAC), pages 421–430, 2007.Search in Google Scholar

19. M. Musch, C. Wressnegger, M. Johns, and K. Rieck. New kid on the web: A study on the prevalence of WebAssembly in the wild. In Proc. of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), pages 23–42, 2019.Search in Google Scholar

20. G. Pellegrino, M. Johns, S. Koch, M. Backes, and C. Rossow. Deemon: Detecting CSRF with dynamic analysis and property graphs. In Proc. of the ACM Conference on Computer and Communications Security (CCS), pages 1757–1771, 2017.Search in Google Scholar

21. R. Perdisci, D. Ariu, P. Fogla, G. Giacinto, and W. Lee. McPAD: A multiple classifier system for accurate payload-based anomaly detection. Computer Networks, 5 (6): 864–881, 2009.Search in Google Scholar

22. W. Pugh. Skip Lists: A probabilistic alternative to balanced trees. Communications of the ACM, 33 (6): 668–676, 1990.Search in Google Scholar

23. K. Rieck, T. Krueger, and A. Dewald. Cujo: Efficient detection and prevention of drive-by-download attacks. In Proc. of the Annual Computer Security Applications Conference (ACSAC), pages 31–39, 2010.Search in Google Scholar

24. G. Salton, A. Wong, and C.-S. Yang. A vector space model for automatic indexing. Communications of the ACM, 18 (11): 613–620, 1975.Search in Google Scholar

25. T. Schreck, S. Berger, and J. Göbel. BISSAM: Automatic vulnerability identification of office documents. In Proc. of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), 2012.Search in Google Scholar

26. R. Seidel and C. R. Aragon. Randomized search trees. Algorithmica, 16 (4): 464–497, 1996.Search in Google Scholar

27. Y. Shen, E. Mariconti, P.-A. Vervier, and G. Stringhini. Tiresias: Predicting security events through deep learning. In Proc. of the ACM Conference on Computer and Communications Security (CCS), 2018.Search in Google Scholar

28. Q. Shi, J. Petterson, G. Dror, J. C. Langford, A. Smola, and S. Vishwanathan. Hash kernels for structured data. In Journal of Machine Learning Research (JMLR), pages 1113–1120, 2009.Search in Google Scholar

29. B. Stock, S. Pfistner, B. Kaiser, S. Lekies, and M. Johns. From facepalm to brain bender: Exploring client-side cross-site scripting. In Proc. of the ACM Conference on Computer and Communications Security (CCS), pages 1419–1430, 2015.Search in Google Scholar

30. C. Y. Suen. N-gram statistics for natural language understanding and text processing. IEEE Trans. Pattern Analysis and Machine Intelligence, 1 (2): 164–172, 1979.Search in Google Scholar

31. N. Šrndić and P. Laskov. Detection of malicious PDF files based on hierarchical document structure. In Proc. of the Network and Distributed System Security Symposium (NDSS), 2013.Search in Google Scholar

32. K. Wang and S. J. Stolfo. Anomalous payload-based network intrusion detection. In Proc. of the International Symposium on Recent Advances in Intrusion Detection (RAID), pages 203–222, 2004.Search in Google Scholar

33. K. Wang, J. J. Parekh, and S. J. Stolfo. Anagram: A content anomaly detector resistant to mimicry attack. In Proc. of the International Symposium on Recent Advances in Intrusion Detection (RAID), pages 226–248, 2006.Search in Google Scholar

34. K. Weinberger, A. Dasgupta, J. C. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In Proc. of the International Conference on Machine Learning (ICML), pages 1113–1120, 2009.Search in Google Scholar

35. C. Wressnegger. Efficient Machine Learning for Attack Detection. Dissertation, Technische Universität Braunschweig, 2018.Search in Google Scholar

36. C. Wressnegger and K. Rieck. Looking back on three years of flash-based malware. In Proc. of the ACM European Workshop on Systems Security (EuroSec), Apr. 2017.Search in Google Scholar

37. C. Wressnegger, F. Boldewin, and K. Rieck. Deobfuscating embedded malware using probable-plaintext attacks. In Proc. of the Symposium on Research in Attacks, Intrusions, and Defenses (RAID), pages 164–183, 2013.Search in Google Scholar

38. C. Wressnegger, G. Schwenk, D. Arp, and K. Rieck. A close look on n-grams in intrusion detection: Anomaly detection vs. classification. In Proc. of the ACM Workshop on Artificial Intelligence and Security (AISEC), pages 67–76, 2013.Search in Google Scholar

39. C. Wressnegger, F. Yamaguchi, D. Arp, and K. Rieck. Comprehensive analysis and detection of flash-based malware. In Proc. of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), pages 101–121, 2016.Search in Google Scholar

40. C. Wressnegger, F. Yamaguchi, A. Maier, and K. Rieck. Twice the bits, twice the trouble: Vulnerabilities induced by migrating to 64-bit platforms. In Proc. of the ACM Conference on Computer and Communications Security (CCS), pages 541–552, 2016.Search in Google Scholar

41. C. Wressnegger, K. Freeman, F. Yamaguchi, and K. Rieck. Automatically inferring malware signatures for anti-virus assisted attacks. In Proc. of the ACM Asia Conference on Computer and Communications Security (ASIACCS), pages 587–598, Apr. 2017.Search in Google Scholar

42. C. Wressnegger, F. Yamaguchi, A. Maier, and K. Rieck. 64-bit migration vulnerabilities. Information Technology (IT), 59 (2): 73–82, Apr. 2017.Search in Google Scholar

43. C. Wressnegger, A. Kellner, and K. Rieck. ZOE: Content-based anomaly detection for industrial control systems. In Proc. of the Conference on Dependable Systems and Networks (DSN), pages 127–138, June 2018.Search in Google Scholar

44. A. Young and M. Yung. Cryptovirology: The birth, neglect, and explosion of ransomware. Communications of the ACM, 60 (7): 24–26, 2017.Search in Google Scholar

Received: 2020-05-17
Revised: 2020-10-30
Accepted: 2020-10-31
Published Online: 2020-11-10
Published in Print: 2020-12-16

© 2020 Walter de Gruyter GmbH, Berlin/Boston