ATTAIN: Automated Exploit Failure Analysis through Trace-Driven Diff Analysis

📅 2026-06-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limitations of traditional approaches in determining whether library versions are affected by vulnerabilities, which often suffer from mislabeling or missed detections and rely heavily on costly manual analysis. The authors propose a novel method that, for the first time, integrates execution trace differences from exploit attempts with the tool-calling capabilities of large language models (LLMs). Through a three-module architecture—comprising trace construction, difference exploration, and impact assessment—the approach guides the model to automatically analyze cross-version code changes and reason about vulnerability impact. It effectively handles challenges such as failed exploits and ambiguous commit messages, achieving an F1 score of 93.24% on a dataset of 224 CVEs and 25,943 library versions, significantly outperforming baseline methods like V-SZZ and LLM4SZZ.

📝 Abstract

Exploits are widely used to check whether library vulnerabilities appear in different versions and to mark affected version ranges. Exploit-based checks sometimes fail because exploits stop running on many versions after API or environment changes. Commit-based methods, such as SZZ-style analysis, sometimes miss the right introduce commits and spread labels incorrectly along long version chains. These problems leave many affected versions unlabeled or wrongly labeled and make manual exploit failure analysis very expensive and impractical at scale. We present ATTAIN, a trace-driven diff analysis framework with three modules to assess vulnerability presence across evolving library versions. The modules are trace construction, diff exploration, and affected-version judgment. The trace construction module executes an exploit across historical library versions and compares their behaviors to capture cross-version execution divergences. Using these divergences, the diff exploration module guides an LLM through a finite-state tool loop to autonomously search over version changes and collect vulnerability-relevant diff hunks. The affected-version judgment module reasons over the collected evidence to determine whether the vulnerability exists in each version and outputs the affected version range. We evaluate ATTAIN on an extensive dataset comprising 224 CVEs and 25,943 library versions across 128 libraries. ATTAIN achieves an F1-score of 93.24%, outperforming the commit-based methods V-SZZ and LLM4SZZ by 116.28% and 33.30%, respectively. ATTAIN uses short tool-guided prompts and a fixed number of iterations, keeping token usage low. It matches or surpasses existing methods on frequent CWE types, including cases where exploit runs fail for non-vulnerability reasons or commit messages do not clearly delimit affected versions.

Problem

Research questions and friction points this paper is trying to address.

exploit failure

vulnerability labeling

version analysis

affected version range

trace divergence

Innovation

Methods, ideas, or system contributions that make the work stand out.

trace-driven analysis

diff exploration

LLM-guided vulnerability assessment