🤖 AI Summary
This work addresses the limitations of existing static analysis tools, which often suffer from imprecise program representations, and large language model (LLM)-based approaches, which frequently lack critical vulnerability context and exhibit unreliable reasoning. To overcome these challenges, the authors propose a novel method that integrates deterministic rules with LLM-driven semantic reasoning to construct an enhanced unified dependency graph, effectively capturing both explicit and implicit contextual information to fully characterize vulnerability scenarios. Furthermore, they introduce a meta-prompting mechanism guided by expert knowledge to steer the LLM toward systematic, type-specific, and evidence-based vulnerability reasoning. Evaluated on the PrimeVul4J dataset, the approach achieves an F1 score of 0.75, substantially outperforming current methods. In real-world Java projects, it identified 26 vulnerabilities, 15 of which were confirmed by developers and led to five assigned CVEs, and an industrial deployment uncovered 40 additional verified vulnerabilities.
📝 Abstract
Detecting vulnerabilities in source code remains critical yet challenging, as conventional static analysis tools construct inaccurate program representations, while existing LLM-based approaches often miss essential vulnerability context and lack grounded reasoning. To mitigate these challenges, we introduce VulWeaver, a novel LLM-based approach that weaves broken program semantics into accurate representations and extracts holistic vulnerability context for grounded vulnerability detection. Specifically, VulWeaver first constructs an enhanced unified dependency graph (UDG) by integrating deterministic rules with LLM-based semantic inference to address static analysis inaccuracies. It then extracts holistic vulnerability context by combining explicit contexts from program slicing with implicit contexts, including usage, definition, and declaration information. Finally, VulWeaver employs meta-prompting with vulnerability type specific expert guidelines to steer LLMs through systematic reasoning, aggregated via majority voting for robustness. Extensive experiments on PrimeVul4J dataset have demonstrated that VulWeaver achieves a F1-score of 0.75, outperforming state-of-the-art learning-based, LLM-based, and agent-based baselines by 23%, 15%, and 60% in F1-score, respectively. VulWeaver has also detected 26 true vulnerabilities across 9 realworld Java projects, with 15 confirmed by developers and 5 CVE identifiers assigned. In industrial deployment, VulWeaver identified 40 confirmed vulnerabilities in an internal repository.