🤖 AI Summary
Smart contract audit reports exhibit high heterogeneity and lack executable, verifiable proofs-of-concept (PoCs), resulting in costly, non-reproducible manual validation. To address this, we propose the first large language model (LLM) framework that automatically transforms audit reports into executable PoCs. Our approach innovatively integrates function-level context refinement, compilation- and execution-feedback-driven pre- and post-repair mechanisms, and differential-verification-based runtime oracles—collectively mitigating hallucination and ensuring vulnerability reproducibility. Evaluated on SmartBugs-Vul and FORGE-Vul benchmarks, our framework achieves 85.61% and 86.45% executability rates for generated Foundry tests, respectively. When applied to real-world Etherscan contracts, it confirms 236 known vulnerabilities at an average cost of $0.03 per case. This significantly enhances the automation, efficiency, and reliability of audit-result verification.
📝 Abstract
Smart contracts are prone to vulnerabilities and are analyzed by experts as well as automated systems, such as static analysis and AI-assisted solutions. However, audit artifacts are heterogeneous and often lack reproducible, executable PoC tests suitable for automated validation, leading to costly, ad hoc manual verification. Large language models (LLMs) can be leveraged to turn audit reports into PoC test cases, but have three major challenges: noisy inputs, hallucinations, and missing runtime oracles. In this paper, we present SmartPoC, an automated framework that converts textual audit reports into executable, validated test cases. First, the input audit report is processed to reduce noise, and only bug-related functions are extracted and fed to LLMs as context. To curb hallucinations and ensure compile-and-run readiness, we leverage LLMs to synthesize PoC test cases with specially-designed pre-/post-execution repair. We further utilize differential verification as oracles to confirm exploitability of the PoC test cases. On the SmartBugs-Vul and FORGE-Vul benchmarks, SmartPoC generates executable, validated Foundry test cases for 85.61% and 86.45% of targets, respectively. Applied to the latest Etherscan verified-source corpus, SmartPoC confirms 236 real bugs out of 545 audit findings at a cost of only $0.03 per finding.