ZERO-APT: A Closed-Loop Adversarial Framework for LLM-Driven Automated Penetration Testing under Intelligent Defense

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This study addresses critical limitations in current large language model (LLM)-driven automated penetration testing, particularly its insufficient realism, weak causal consistency in multi-step attack chains, and lack of auditable decision-making when confronting intelligent defense systems. To overcome these challenges, the authors propose a tripartite closed-loop adversarial framework integrating a configurable LLM-based defender, decoupled planning and execution phases, a multidimensional ReAct feedback mechanism, a hard-constrained action library, and裁判-generated structured cyber threat intelligence (CTI) reports. Implemented within a unified architecture, this approach simultaneously enhances test realism, causal coherence, and auditability. Empirical evaluation in post-exploitation scenarios on Windows Server 2022 demonstrates a 79% attack success rate, a causal consistency score of 0.860, and full traceability of all decision steps throughout the testing process.

📝 Abstract

LLM-driven automated penetration testing agents are typically evaluated against static targets that neither detect nor respond to attacks, so their behavior under intelligent defense remains untested. The causal consistency of multi-step attack chains likewise hinges on unstable LLM reasoning, and agent decisions remain opaque to human analysts. These three shortcomings, in realism, consistency, and auditability, are usually patched in isolation. We present ZERO-APT, a turn-based attacker-defender-judge framework that addresses them within a single architecture. For realism, ZERO-APT embeds a configurable LLM Defender that consumes Sysmon telemetry and detects attacks in real time, exposing the attacker to a live opponent rather than a passive target. For consistency, three architectural mechanisms move causal consistency from unstable LLM reasoning into enforced system architecture: separation of planning from execution, multi-dimensional ReAct feedback, and a hard-constraint-filtered action library. For auditability, a dedicated Judge agent adjudicates each round, maintains global state, and emits structured post-hoc CTI reports that make every decision traceable. We evaluate a Windows Server 2022 post-exploitation prototype across five scenarios with three Defender configurations. ZERO-APT reaches 79\% attack success rate (Aurora 22\%, PentestGPT 39\%), a Causal Consistency Score of 0.860 (Aurora 0.930, Claude Code 0.520), and end-to-end decision auditability through structured CTI reports. We release the benchmark to support evaluation of penetration agents under intelligent defense.

Problem

Research questions and friction points this paper is trying to address.

automated penetration testing

intelligent defense

causal consistency

auditability

LLM-driven agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

adversarial framework

LLM-driven penetration testing

causal consistency