🤖 AI Summary
To address challenges in APT attack chain reconstruction—including weak cross-platform generalization, difficulty modeling long-range log dependencies, and non-actionable report generation—this paper proposes a threat-knowledge-enhanced large language model (LLM) framework. Our method introduces stage-aware dynamic kill-chain knowledge units that integrate semantic annotation with iterative cross-log causal reasoning, supporting both single- and multi-host Windows/Linux environments. We further enhance LLMs’ capability to capture long-range dependencies in heterogeneous logs via semantic-augmented retrieval and context-aware dynamic expansion. Evaluated on 15 real-world attack scenarios comprising 4.3 million events (7.2 GB), our approach achieves 97.1% true positive rate and only 0.2% false positive rate—substantially outperforming the state-of-the-art ATLAS (79.2%/29.1%). Moreover, it directly generates human-readable, actionable forensic reports.
📝 Abstract
Advanced Persistent Threats (APTs) are prolonged, stealthy intrusions by skilled adversaries that compromise high-value systems to steal data or disrupt operations. Reconstructing complete attack chains from massive, heterogeneous logs is essential for effective attack investigation, yet existing methods suffer from poor platform generality, limited generalization to evolving tactics, and an inability to produce analyst-ready reports. Large Language Models (LLMs) offer strong semantic understanding and summarization capabilities, but in this domain they struggle to capture the long-range, cross-log dependencies critical for accurate reconstruction.
To solve these problems, we present an LLM-empowered attack investigation framework augmented with a dynamically adaptable Kill-Chain-aligned threat knowledge base. We organizes attack-relevant behaviors into stage-aware knowledge units enriched with semantic annotations, enabling the LLM to iteratively retrieve relevant intelligence, perform causal reasoning, and progressively expand the investigation context. This process reconstructs multi-phase attack scenarios and generates coherent, human-readable investigation reports. Evaluated on 15 attack scenarios spanning single-host and multi-host environments across Windows and Linux (over 4.3M log events, 7.2 GB of data), the system achieves an average True Positive Rate (TPR) of 97.1% and an average False Positive Rate (FPR) of 0.2%, significantly outperforming the SOTA method ATLAS, which achieves an average TPR of 79.2% and an average FPR of 29.1%.