An Automated Attack Investigation Approach Leveraging Threat-Knowledge-Augmented Large Language Models

📅 2025-09-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address challenges in APT attack chain reconstruction—including weak cross-platform generalization, difficulty modeling long-range log dependencies, and non-actionable report generation—this paper proposes a threat-knowledge-enhanced large language model (LLM) framework. Our method introduces stage-aware dynamic kill-chain knowledge units that integrate semantic annotation with iterative cross-log causal reasoning, supporting both single- and multi-host Windows/Linux environments. We further enhance LLMs’ capability to capture long-range dependencies in heterogeneous logs via semantic-augmented retrieval and context-aware dynamic expansion. Evaluated on 15 real-world attack scenarios comprising 4.3 million events (7.2 GB), our approach achieves 97.1% true positive rate and only 0.2% false positive rate—substantially outperforming the state-of-the-art ATLAS (79.2%/29.1%). Moreover, it directly generates human-readable, actionable forensic reports.

Technology Category

Application Category

📝 Abstract
Advanced Persistent Threats (APTs) are prolonged, stealthy intrusions by skilled adversaries that compromise high-value systems to steal data or disrupt operations. Reconstructing complete attack chains from massive, heterogeneous logs is essential for effective attack investigation, yet existing methods suffer from poor platform generality, limited generalization to evolving tactics, and an inability to produce analyst-ready reports. Large Language Models (LLMs) offer strong semantic understanding and summarization capabilities, but in this domain they struggle to capture the long-range, cross-log dependencies critical for accurate reconstruction. To solve these problems, we present an LLM-empowered attack investigation framework augmented with a dynamically adaptable Kill-Chain-aligned threat knowledge base. We organizes attack-relevant behaviors into stage-aware knowledge units enriched with semantic annotations, enabling the LLM to iteratively retrieve relevant intelligence, perform causal reasoning, and progressively expand the investigation context. This process reconstructs multi-phase attack scenarios and generates coherent, human-readable investigation reports. Evaluated on 15 attack scenarios spanning single-host and multi-host environments across Windows and Linux (over 4.3M log events, 7.2 GB of data), the system achieves an average True Positive Rate (TPR) of 97.1% and an average False Positive Rate (FPR) of 0.2%, significantly outperforming the SOTA method ATLAS, which achieves an average TPR of 79.2% and an average FPR of 29.1%.
Problem

Research questions and friction points this paper is trying to address.

Reconstructing complete attack chains from massive heterogeneous logs
Overcoming poor platform generality and limited generalization to evolving tactics
Generating coherent human-readable investigation reports for analysts
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM framework with dynamic threat knowledge base
Stage-aware knowledge units with semantic annotations
Iterative retrieval and causal reasoning for attack reconstruction
🔎 Similar Papers
No similar papers found.
R
Rujie Dai
Institute of Information Engineering, Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China
Peizhuo Lv
Peizhuo Lv
Research Fellow, Nanyang Technological University
AI Security
Y
Yujiang Gui
University of New South Wales, Australia
Q
Qiujian Lv
Institute of Information Engineering, Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China
Y
Yuanyuan Qiao
Beijing University of Posts and Telecommunications, China
Y
Yan Wang
Institute of Information Engineering, Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China
D
Degang Sun
Computer Network Information Center, Chinese Academy of Sciences, China
W
Weiqing Huang
Institute of Information Engineering, Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China
Yingjiu Li
Yingjiu Li
Ripple Professor, Computer Science Department, University of Oregon
Mobile and System SecurityApplied Cryptography and Cloud SecurityData Application Security and Privacy
XiaoFeng Wang
XiaoFeng Wang
Chair, ACM SIGSAC
AI-Centered SecuritySystems Security and PrivacyHealthcare PrivacyIncentive Engineering