MaLAware: Automating the Comprehension of Malicious Software Behaviours using Large Language Models (LLMs)

📅 2025-04-01

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

Existing malware analysis tools excel at detection and family classification but lack interpretable, actionable explanations of malicious behavior—hindering incident response efficiency and increasing analyst cognitive load. To address this, we propose the first end-to-end LLM-driven framework for explaining malware behavior: it parses Cuckoo sandbox reports and systematically integrates open-weight LLMs—including Qwen2.5-7B, Llama3.1-7B, and Mistral-7B—to perform behavioral attribution and generate natural-language narrative summaries. We design a standardized 11-metric evaluation framework and validate our approach on a manually annotated real-world dataset. Experimental results demonstrate significant improvements in the accuracy, comprehensibility, and operational utility of behavioral descriptions. Our work establishes a reproducible, quantifiable paradigm for leveraging LLMs in explainable cybersecurity analysis.

Technology Category

Application Category

📝 Abstract

Current malware (malicious software) analysis tools focus on detection and family classification but fail to provide clear and actionable narrative insights into the malignant activity of the malware. Therefore, there is a need for a tool that translates raw malware data into human-readable descriptions. Developing such a tool accelerates incident response, reduces malware analysts' cognitive load, and enables individuals having limited technical expertise to understand malicious software behaviour. With this objective, we present MaLAware, which automatically summarizes the full spectrum of malicious activity of malware executables. MaLAware processes Cuckoo Sandbox-generated reports using large language models (LLMs) to correlate malignant activities and generate concise summaries explaining malware behaviour. We evaluate the tool's performance on five open-source LLMs. The evaluation uses the human-written malware behaviour description dataset as ground truth. The model's performance is measured using 11 extensive performance metrics, which boosts the confidence of MaLAware's effectiveness. The current version of the tool, i.e., MaLAware, supports Qwen2.5-7B, Llama2-7B, Llama3.1-8B, Mistral-7B, and Falcon-7B, along with the quantization feature for resource-constrained environments. MaLAware lays a foundation for future research in malware behavior explanation, and its extensive evaluation demonstrates LLMs' ability to narrate malware behavior in an actionable and comprehensive manner.

Problem

Research questions and friction points this paper is trying to address.

Translates raw malware data into human-readable descriptions

Automates summarizing malicious activity of malware executables

Evaluates LLMs for narrating malware behavior comprehensively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLMs to summarize malware behaviors

Processes Cuckoo Sandbox reports automatically

Supports multiple quantized LLM models

🔎 Similar Papers

Large Language Models for Cyber Security: A Systematic Literature Review