🤖 AI Summary
This work addresses the absence of a systematic survey on Transformer applications in malware analysis by conducting the first Systematization of Knowledge (SoK) in this domain. Through qualitative analysis, cross-study comparison, and dataset-level meta-analysis, it comprehensively examines multi-source feature modeling—including code sequences, API call traces, raw binary bytes, and control-flow graph (CFG) embeddings. The study introduces the first dual-dimensional taxonomy: one axis categorizes model adaptation strategies (e.g., fine-tuning, adapter-based tuning, lightweight architectures), while the other characterizes multimodal feature representation capabilities. By synthesizing major public benchmarks, it identifies six key open challenges: few-shot generalization, interpretability, adversarial robustness, scalability, cross-platform transferability, and label efficiency. This SoK fills a critical gap in the field’s knowledge landscape, delivering a reusable classification framework and practical guidelines to advance research and deployment of next-generation AI-driven malware detection systems.
📝 Abstract
The introduction of transformers has been an important breakthrough for AI research and application as transformers are the foundation behind Generative AI. A promising application domain for transformers is cybersecurity, in particular the malware domain analysis. The reason is the flexibility of the transformer models in handling long sequential features and understanding contextual relationships. However, as the use of transformers for malware analysis is still in the infancy stage, it is critical to evaluate, systematize, and contextualize existing literature to foster future research. This Systematization of Knowledge (SoK) paper aims to provide a comprehensive analysis of transformer-based approaches designed for malware analysis. Based on our systematic analysis of existing knowledge, we structure and propose taxonomies based on: (a) how different transformers are adapted, organized, and modified across various use cases; and (b) how diverse feature types and their representation capabilities are reflected. We also provide an inventory of datasets used to explore multiple research avenues in the use of transformers for malware analysis and discuss open challenges with future research directions. We believe that this SoK paper will assist the research community in gaining detailed insights from existing work and will serve as a foundational resource for implementing novel research using transformers for malware analysis.