Efficient machine learning for attack detection

Prof. Dr. Christian Wressnegger 1
  • 1 Karlsruhe Institute of Technology (KIT), Institute of Theoretic Computer Science, Karlsruhe, Germany
Prof. Dr. Christian Wressnegger
  • Corresponding author
  • Karlsruhe Institute of Technology (KIT), Institute of Theoretic Computer Science, D-76131, Karlsruhe, Germany
  • Email
  • Further information
  • Prof. Dr. Christian Wressnegger is an Assistant Professor at Karlsruhe Institute of Technology (KIT), where he leads the “Intelligent System Security” research group. He graduated from Technische Universität Graz with a Masters degree in computer science in 2008 and received a Doctorate from Technische Universität Braunschweig in 2018. Christian Wressnegger has been a runner-up of the CAST/GI Dissertation Award for IT-Security 2019 and recipient of the German Prize for IT-Security 2016. His research interests revolve around the combination of computer security with machine learning, such as the detection and prevention of attacks, vulnerability discovery, and the explainability of learning-based systems.
  • Search for other articles:
  • degruyter.comGoogle Scholar

Abstract

Detecting and fending off attacks on computer systems is an enduring problem in computer security. In light of a plethora of different threats and the growing automation used by attackers, we are in urgent need of more advanced methods for attack detection. Manually crafting detection rules is by no means feasible at scale, and automatically generated signatures often lack context, such that they fall short in detecting slight variations of known threats.

In the thesis “Efficient Machine Learning for Attack Detection” [], we address the necessity of advanced attack detection. For the effective application of machine learning in this domain, a periodic retraining over time is crucial. We show that with the right data representation, efficient algorithms for mining substring statistics, and implementations based on probabilistic data structures, training the underlying model for establishing an higher degree of automation for defenses can be achieved in linear time.

  • 1.

    C. Aggarwal. A framework for clustering massive-domain data streams. In Proc. of the International Conference on Data Engineering (ICDE), pages 102–113, 2009.

  • 2.

    D. Arp, M. Spreitzenbarth, M. Hübner, H. Gascon, and K. Rieck. Drebin: Efficient and explainable detection of Android malware in your pocket. In Proc. of the Network and Distributed System Security Symposium (NDSS), 2014.

  • 3.

    U. Bayer, P. M. Comparetti, C. Hlauschek, C. Kruegel, and E. Kirda. Scalable, behavior-based malware clustering. In Proc. of the Network and Distributed System Security Symposium (NDSS), 2009.

  • 4.

    B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communication of the ACM, 13 (7): 422–426, 1970.

    • Crossref
    • Export Citation
  • 5.

    W. B. Cavnar and J. M. Trenkle. N-gram-based text categorization. In Proc. of the Symposium on Document Analysis and Information Retrieval, pages 161–175, 1994.

  • 6.

    A. Cherepanov. Win32/industroyer – a new threat for industrial control systems. Technical report, ESET, 2017.

  • 7.

    Z. L. Chua, S. Shen, P. Saxena, and Z. Liang. Neural nets can learn function type signatures from binaries. In Proc. of the USENIX Security Symposium, pages 99–116, 2017.

  • 8.

    G. Cormode and S. Muthukrishnan. Approximating data with the count-min sketch. Journal of IEEE Software, 29 (1): 64–69, 2012.

    • Crossref
    • Export Citation
  • 9.

    C. Feng, T. Li, and D. Chana. Multi-level anomaly detection in industrial control systems via package signatures and LSTM networks. In Proc. of the Conference on Dependable Systems and Networks (DSN), pages 261–272, 2017.

  • 10.

    S. Forrest, S. A. Hofmeyr, A. Somayaji, and T. A. Longstaff. A sense of self for unix processes. In Proc. of the IEEE Symposium on Security and Privacy, pages 120–128, 1996.

  • 11.

    J. Franklin, V. Paxson, A. Perrig, and S. Savage. An Inquiry Into the Nature and Causes of the Wealth of Internet Miscreants. In Proc. of the ACM Conference on Computer and Communications Security (CCS), pages 375–388, 2007.

  • 12.

    D. Hadžiosmanović, L. Simionato, D. Bolzoni, E. Zambon, and S. Etalle. N-gram against the machine: On the feasibility of the n-gram network analysis for binary protocols. In Proc. of the International Symposium on Research in Attacks, Intrusions and Defenses (RAID), pages 354–373, 2012.

  • 13.

    A. Kapravelos, Y. Shoshitaishvili, M. Cova, C. Kruegel, and G. Vigna. Revolver: An automated approach to the detection of evasive web-based malware. In Proc. of the USENIX Security Symposium, pages 637–651, 2013.

  • 14.

    S. Karnouskos. Stuxnet worm impact on industrial cyber-physical system security. In Proc. of Annual Conference on IEEE Industrial Electronics Society (IECON), pages 4490–4494, 2011.

  • 15.

    R. K. Konoth, E. Vineti, V. Moonsamy, M. Lindorfer, C. Kruegel, H. Bos, and G. Vigna. An in-depth look into drive-by mining and its defense. In Proc. of the ACM Conference on Computer and Communications Security (CCS), 2018.

  • 16.

    P. Laskov and N. Šrndić. Static detection of malicious JavaScript-bearing PDF documents. In Proc. of the Annual Computer Security Applications Conference (ACSAC), pages 373–382, 2011.

  • 17.

    A. Maier, H. Gascon, C. Wressnegger, and K. Rieck. TypeMiner: Recovering types in binary programs using machine learning. In Proc. of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), pages 288–308, 2019.

  • 18.

    A. Moser, C. Kruegel, and E. Kirda. Limits of static analysis for malware detection. In Proc. of the Annual Computer Security Applications Conference (ACSAC), pages 421–430, 2007.

  • 19.

    M. Musch, C. Wressnegger, M. Johns, and K. Rieck. New kid on the web: A study on the prevalence of WebAssembly in the wild. In Proc. of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), pages 23–42, 2019.

  • 20.

    G. Pellegrino, M. Johns, S. Koch, M. Backes, and C. Rossow. Deemon: Detecting CSRF with dynamic analysis and property graphs. In Proc. of the ACM Conference on Computer and Communications Security (CCS), pages 1757–1771, 2017.

  • 21.

    R. Perdisci, D. Ariu, P. Fogla, G. Giacinto, and W. Lee. McPAD: A multiple classifier system for accurate payload-based anomaly detection. Computer Networks, 5 (6): 864–881, 2009.

  • 22.

    W. Pugh. Skip Lists: A probabilistic alternative to balanced trees. Communications of the ACM, 33 (6): 668–676, 1990.

    • Crossref
    • Export Citation
  • 23.

    K. Rieck, T. Krueger, and A. Dewald. Cujo: Efficient detection and prevention of drive-by-download attacks. In Proc. of the Annual Computer Security Applications Conference (ACSAC), pages 31–39, 2010.

  • 24.

    G. Salton, A. Wong, and C.-S. Yang. A vector space model for automatic indexing. Communications of the ACM, 18 (11): 613–620, 1975.

    • Crossref
    • Export Citation
  • 25.

    T. Schreck, S. Berger, and J. Göbel. BISSAM: Automatic vulnerability identification of office documents. In Proc. of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), 2012.

  • 26.

    R. Seidel and C. R. Aragon. Randomized search trees. Algorithmica, 16 (4): 464–497, 1996.

    • Crossref
    • Export Citation
  • 27.

    Y. Shen, E. Mariconti, P.-A. Vervier, and G. Stringhini. Tiresias: Predicting security events through deep learning. In Proc. of the ACM Conference on Computer and Communications Security (CCS), 2018.

  • 28.

    Q. Shi, J. Petterson, G. Dror, J. C. Langford, A. Smola, and S. Vishwanathan. Hash kernels for structured data. In Journal of Machine Learning Research (JMLR), pages 1113–1120, 2009.

  • 29.

    B. Stock, S. Pfistner, B. Kaiser, S. Lekies, and M. Johns. From facepalm to brain bender: Exploring client-side cross-site scripting. In Proc. of the ACM Conference on Computer and Communications Security (CCS), pages 1419–1430, 2015.

  • 30.

    C. Y. Suen. N-gram statistics for natural language understanding and text processing. IEEE Trans. Pattern Analysis and Machine Intelligence, 1 (2): 164–172, 1979.

  • 31.

    N. Šrndić and P. Laskov. Detection of malicious PDF files based on hierarchical document structure. In Proc. of the Network and Distributed System Security Symposium (NDSS), 2013.

  • 32.

    K. Wang and S. J. Stolfo. Anomalous payload-based network intrusion detection. In Proc. of the International Symposium on Recent Advances in Intrusion Detection (RAID), pages 203–222, 2004.

  • 33.

    K. Wang, J. J. Parekh, and S. J. Stolfo. Anagram: A content anomaly detector resistant to mimicry attack. In Proc. of the International Symposium on Recent Advances in Intrusion Detection (RAID), pages 226–248, 2006.

  • 34.

    K. Weinberger, A. Dasgupta, J. C. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In Proc. of the International Conference on Machine Learning (ICML), pages 1113–1120, 2009.

  • 35.

    C. Wressnegger. Efficient Machine Learning for Attack Detection. Dissertation, Technische Universität Braunschweig, 2018.

  • 36.

    C. Wressnegger and K. Rieck. Looking back on three years of flash-based malware. In Proc. of the ACM European Workshop on Systems Security (EuroSec), Apr. 2017.

  • 37.

    C. Wressnegger, F. Boldewin, and K. Rieck. Deobfuscating embedded malware using probable-plaintext attacks. In Proc. of the Symposium on Research in Attacks, Intrusions, and Defenses (RAID), pages 164–183, 2013.

  • 38.

    C. Wressnegger, G. Schwenk, D. Arp, and K. Rieck. A close look on n-grams in intrusion detection: Anomaly detection vs. classification. In Proc. of the ACM Workshop on Artificial Intelligence and Security (AISEC), pages 67–76, 2013.

  • 39.

    C. Wressnegger, F. Yamaguchi, D. Arp, and K. Rieck. Comprehensive analysis and detection of flash-based malware. In Proc. of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), pages 101–121, 2016.

  • 40.

    C. Wressnegger, F. Yamaguchi, A. Maier, and K. Rieck. Twice the bits, twice the trouble: Vulnerabilities induced by migrating to 64-bit platforms. In Proc. of the ACM Conference on Computer and Communications Security (CCS), pages 541–552, 2016.

  • 41.

    C. Wressnegger, K. Freeman, F. Yamaguchi, and K. Rieck. Automatically inferring malware signatures for anti-virus assisted attacks. In Proc. of the ACM Asia Conference on Computer and Communications Security (ASIACCS), pages 587–598, Apr. 2017.

  • 42.

    C. Wressnegger, F. Yamaguchi, A. Maier, and K. Rieck. 64-bit migration vulnerabilities. Information Technology (IT), 59 (2): 73–82, Apr. 2017.

  • 43.

    C. Wressnegger, A. Kellner, and K. Rieck. ZOE: Content-based anomaly detection for industrial control systems. In Proc. of the Conference on Dependable Systems and Networks (DSN), pages 127–138, June 2018.

  • 44.

    A. Young and M. Yung. Cryptovirology: The birth, neglect, and explosion of ransomware. Communications of the ACM, 60 (7): 24–26, 2017.

    • Crossref
    • Export Citation
Purchase article
Get instant unlimited access to the article.
$42.00
Log in
Already have access? Please log in.


or
Log in with your institution

Journal + Issues

Search