Generating Proof-of-Vulnerability Tests to Help Enhance the Security of Complex Software

📅 2026-05-05

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Existing approaches struggle to efficiently construct executable proofs of vulnerability (PoVs) that demonstrate the real-world exploitability of third-party library vulnerabilities within applications. This work proposes an automated PoV generation framework powered by large language model (LLM) agents, which integrates call-path analysis, code context, and execution feedback to iteratively refine high-quality test cases through multi-round prompt engineering. The effectiveness of generated PoVs is evaluated using execution logs. Experimental evaluation on 33 Java application–library pairs shows that the method successfully identifies 152 valid entry points and produces 84 PoVs (55%) capable of reproducing realistic attack paths, substantially outperforming current LLM-driven approaches.

📝 Abstract

Developers create modern software applications (Apps) on top of third-party libraries (Libs). When library vulnerabilities are reachable through application code, the applications can be vulnerable to software supply chain attacks. Prior work shows that developers often require concrete and executable evidence, i.e., proof-of-vulnerability (PoV) tests, to decide whether a reported dependency vulnerability poses a practical security risk to their application. However, manually crafting such tests is challenging, and existing tool support is insufficient to automate the procedure. To streamline test generation, we created PoVSmith -- a new approach that combines call path analysis, exemplar test, code context, and feedback into multiple prompts to guide a coding agent (i.e., Codex) and a large language model (i.e., GPT) for test generation, execution, and assessment. We evaluated PoVSmith on 33 $\langle$App, Lib$\rangle$ Java program pairs, where each App depends on a vulnerable Lib. PoVSmith revealed 158 unique application-level entry points (i.e., public methods) calling vulnerable library APIs; 152 (96\%) of them were correctly found, together with the call paths properly recognized. With such method call information, PoVSmith generated 152 tests, 84 (55\%) of which demonstrated feasible ways of attacking Apps by exploiting Lib vulnerabilities. PoVSmith substantially outperforms the state-of-the-art LLM-based approach, as it reduces human involvement while dramatically improving test quality. Our work contributes (1) a novel approach of agent-based test generation, (2) an iterative code refinement process driven by execution feedback, and (3) LLM-based quality assessment grounded in both the test context and execution logs.

Problem

Research questions and friction points this paper is trying to address.

proof-of-vulnerability

software supply chain attacks

third-party library vulnerabilities

executable test generation

security risk assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proof-of-Vulnerability

LLM-guided test generation

call path analysis