Think Broad, Act Narrow: CWE Identification with Multi-Agent Large Language Models

📅 2025-08-02

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Existing LLM-based vulnerability detection methods suffer from three key limitations: insufficient support for deep semantic analysis, confinement to function-level context, and frequent misclassification in CWE categorization. This paper proposes a novel multi-agent LLM framework that decomposes the vulnerability identification process into three collaborative agents—candidate generation, cross-function contextual validation, and decision-making—thereby integrating broad-reasoning capabilities with fine-grained contextual modeling to enable interpretable, fine-grained CWE attribution. Its primary innovation lies in the first application of multi-agent architectures to precise CWE root-cause attribution, overcoming the inherent constraints of single-model, end-to-end prediction. On the PrimeVul benchmark, the framework achieves a 40.9% CWE identification rate in the initial stage; on synthetic programs, context-aware integration reduces false positives from 6–9 to 1–2 and attains a 90% accuracy in ground-truth CWE identification.

Technology Category

Application Category

📝 Abstract

Machine learning and Large language models (LLMs) for vulnerability detection has received significant attention in recent years. Unfortunately, state-of-the-art techniques show that LLMs are unsuccessful in even distinguishing the vulnerable function from its benign counterpart, due to three main problems: Vulnerability detection requires deep analysis, which LLMs often struggle with when making a one-shot prediction. Existing techniques typically perform function-level analysis, whereas effective vulnerability detection requires contextual information beyond the function scope. The focus on binary classification can result in identifying a vulnerability but associating it with the wrong security weaknesses (CWE), which may mislead developers. We propose a novel multi-agent LLM approach to address the challenges of identifying CWEs. This approach consists of three steps: (1) a team of LLM agents performs an exhaustive search for potential CWEs in the function under review, (2) another team of agents identifies relevant external context to support or refute each candidate CWE, and (3) a final agent makes informed acceptance or rejection decisions for each CWE based on the gathered context. A preliminary evaluation of our approach shows promising results. In the PrimeVul dataset, Step 1 correctly identifies the appropriate CWE in 40.9% of the studied vulnerable functions. We further evaluated the full pipeline on ten synthetic programs and found that incorporating context information significantly reduced false positives from 6 to 9 CWEs to just 1 to 2, while still correctly identifying the true CWE in 9 out of 10 cases.

Problem

Research questions and friction points this paper is trying to address.

LLMs struggle with deep vulnerability analysis in one-shot predictions

Function-level analysis lacks contextual information for accurate CWE identification

Binary classification leads to incorrect CWE associations misleading developers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent LLMs for exhaustive CWE search

Context-aware agents to validate CWEs

Final agent makes informed CWE decisions

🔎 Similar Papers

An Insight into Security Code Review with LLMs: Capabilities, Obstacles and Influential Factors