🤖 AI Summary
Existing malware analysis tools excel at detection and family classification but lack interpretable, actionable explanations of malicious behavior—hindering incident response efficiency and increasing analyst cognitive load. To address this, we propose the first end-to-end LLM-driven framework for explaining malware behavior: it parses Cuckoo sandbox reports and systematically integrates open-weight LLMs—including Qwen2.5-7B, Llama3.1-7B, and Mistral-7B—to perform behavioral attribution and generate natural-language narrative summaries. We design a standardized 11-metric evaluation framework and validate our approach on a manually annotated real-world dataset. Experimental results demonstrate significant improvements in the accuracy, comprehensibility, and operational utility of behavioral descriptions. Our work establishes a reproducible, quantifiable paradigm for leveraging LLMs in explainable cybersecurity analysis.
📝 Abstract
Current malware (malicious software) analysis tools focus on detection and family classification but fail to provide clear and actionable narrative insights into the malignant activity of the malware. Therefore, there is a need for a tool that translates raw malware data into human-readable descriptions. Developing such a tool accelerates incident response, reduces malware analysts' cognitive load, and enables individuals having limited technical expertise to understand malicious software behaviour. With this objective, we present MaLAware, which automatically summarizes the full spectrum of malicious activity of malware executables. MaLAware processes Cuckoo Sandbox-generated reports using large language models (LLMs) to correlate malignant activities and generate concise summaries explaining malware behaviour. We evaluate the tool's performance on five open-source LLMs. The evaluation uses the human-written malware behaviour description dataset as ground truth. The model's performance is measured using 11 extensive performance metrics, which boosts the confidence of MaLAware's effectiveness. The current version of the tool, i.e., MaLAware, supports Qwen2.5-7B, Llama2-7B, Llama3.1-8B, Mistral-7B, and Falcon-7B, along with the quantization feature for resource-constrained environments. MaLAware lays a foundation for future research in malware behavior explanation, and its extensive evaluation demonstrates LLMs' ability to narrate malware behavior in an actionable and comprehensive manner.