AXE: An Agentic eXploit Engine for Confirming Zero-Day Vulnerability Reports

📅 2026-02-15

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Existing security vulnerability detection tools often generate a high volume of false positives and lack effective validation mechanisms. This work proposes AXE, a multi-agent framework that, under a gray-box setting and relying solely on minimal vulnerability metadata—such as CWE classification and code location—automatically generates reproducible exploits for web vulnerabilities. AXE achieves this through a decoupled architecture integrating planning, code exploration, and dynamic execution feedback. To the best of our knowledge, this is the first approach to combine lightweight metadata with multi-agent collaboration to enable automated mapping from vulnerability reports to verifiable proof-of-concept (PoC) exploits. Evaluated on CVE-Bench, AXE attains a 30% exploit success rate, representing a 3× improvement over black-box baselines and a 1.75× gain even in single-agent configurations, while also successfully validating a real-world vulnerability absent from the benchmark.

Technology Category

Application Category

📝 Abstract

Vulnerability detection tools are widely adopted in software projects, yet they often overwhelm maintainers with false positives and non-actionable reports. Automated exploitation systems can help validate these reports; however, existing approaches typically operate in isolation from detection pipelines, failing to leverage readily available metadata such as vulnerability type and source-code location. In this paper, we investigate how reported security vulnerabilities can be assessed in a realistic grey-box exploitation setting that leverages minimal vulnerability metadata, specifically a CWE classification and a vulnerable code location. We introduce Agentic eXploit Engine (AXE), a multi-agent framework for Web application exploitation that maps lightweight detection metadata to concrete exploits through decoupled planning, code exploration, and dynamic execution feedback. Evaluated on the CVE-Bench dataset, AXE achieves a 30% exploitation success rate, a 3x improvement over state-of-the-art black-box baselines. Even in a single-agent configuration, grey-box metadata yields a 1.75x performance gain. Systematic error analysis shows that most failed attempts arise from specific reasoning gaps, including misinterpreted vulnerability semantics and unmet execution preconditions. For successful exploits, AXE produces actionable, reproducible proof-of-concept artifacts, demonstrating its utility in streamlining Web vulnerability triage and remediation. We further evaluate AXE's generalizability through a case study on a recent real-world vulnerability not included in CVE-Bench.

Problem

Research questions and friction points this paper is trying to address.

vulnerability validation

zero-day vulnerability

grey-box exploitation

false positives

automated exploitation

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent framework

grey-box exploitation

vulnerability validation