ANVIL: Anomaly-based Vulnerability Identification without Labelled Training Data

📅 2024-08-28
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of software vulnerability detection in the absence of labeled data. We propose the first unsupervised, line-level vulnerability localization framework based on large language models (LLMs). Our core insight is to recast vulnerability detection as an anomaly detection task: leveraging the inherent probability distribution learned by LLMs during pretraining on vast corpora of benign code, we quantify per-line anomaly scores via confidence gaps—such as log-probability gaps or entropy-based measures—without requiring any vulnerability annotations. The method is context-aware and exhibits strong generalization to novel vulnerabilities emerging after the LLM’s knowledge cutoff date. Evaluated on the Magma benchmark, our approach achieves Top-5 accuracy 1.62–2.18× higher and ROC-AUC 1.02–1.29× higher than state-of-the-art supervised methods (LineVul/LineVD), demonstrating both the viability and superiority of the unsupervised paradigm.

Technology Category

Application Category

📝 Abstract
Supervised learning-based software vulnerability detectors often fall short due to the inadequate availability of labelled training data. In contrast, Large Language Models (LLMs) such as GPT-4, are not trained on labelled data, but when prompted to detect vulnerabilities, LLM prediction accuracy is only marginally better than random guessing. In this paper, we explore a different approach by reframing vulnerability detection as one of anomaly detection. Since the vast majority of code does not contain vulnerabilities and LLMs are trained on massive amounts of such code, vulnerable code can be viewed as an anomaly from the LLM's predicted code distribution, freeing the model from the need for labelled data to provide a learnable representation of vulnerable code. Leveraging this perspective, we demonstrate that LLMs trained for code generation exhibit a significant gap in prediction accuracy when prompted to reconstruct vulnerable versus non-vulnerable code. Using this insight, we implement ANVIL, a detector that identifies software vulnerabilities at line-level granularity. Our experiments explore the discriminating power of different anomaly scoring methods, as well as the sensitivity of ANVIL to context size. We also study the effectiveness of ANVIL on various LLM families, and conduct leakage experiments on vulnerabilities that were discovered after the knowledge cutoff of our evaluated LLMs. On a collection of vulnerabilities from the Magma benchmark, ANVIL outperforms state-of-the-art line-level vulnerability detectors, LineVul and LineVD, which have been trained with labelled data, despite ANVIL having never been trained with labelled vulnerabilities. Specifically, our approach achieves $1.62 imes$ to $2.18 imes$ better Top-5 accuracies and $1.02 imes$ to $1.29 imes$ times better ROC scores on line-level vulnerability detection tasks.
Problem

Research questions and friction points this paper is trying to address.

Detects software vulnerabilities without labeled data.
Reframes vulnerability detection as anomaly detection.
Outperforms state-of-the-art line-level vulnerability detectors.
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs for anomaly detection
No labelled data required
Line-level vulnerability identification
🔎 Similar Papers
No similar papers found.