VISION: Robust and Interpretable Code Vulnerability Detection Leveraging Counterfactual Augmentation

📅 2025-08-26

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

To address poor generalization and weak interpretability in GNN-based code vulnerability detection—caused by data imbalance, label noise, and spurious correlations—this paper proposes a counterfactual-augmented collaborative learning framework. It leverages large language models (LLMs) to generate structure-preserving counterfactual samples, establishing the CWE-20-CFA benchmark dataset (27,556 functions). A graph neural network–LLM collaborative contrastive learning mechanism is designed to suppress reliance on superficial code similarity. Additionally, an attribution consistency metric is introduced to quantify model interpretability. Experiments demonstrate substantial improvements: accuracy on CWE-20 detection rises to 97.8% (+46.0%), pairwise contrastive accuracy reaches 95.8% (+91.3%), and worst-group accuracy significantly improves to 85.5% (+84.8%), markedly enhancing robustness and trustworthiness.

Technology Category

Application Category

📝 Abstract

Automated detection of vulnerabilities in source code is an essential cybersecurity challenge, underpinning trust in digital systems and services. Graph Neural Networks (GNNs) have emerged as a promising approach as they can learn structural and logical code relationships in a data-driven manner. However, their performance is severely constrained by training data imbalances and label noise. GNNs often learn 'spurious' correlations from superficial code similarities, producing detectors that fail to generalize well to unseen real-world data. In this work, we propose a unified framework for robust and interpretable vulnerability detection, called VISION, to mitigate spurious correlations by systematically augmenting a counterfactual training dataset. Counterfactuals are samples with minimal semantic modifications but opposite labels. Our framework includes: (i) generating counterfactuals by prompting a Large Language Model (LLM); (ii) targeted GNN training on paired code examples with opposite labels; and (iii) graph-based interpretability to identify the crucial code statements relevant for vulnerability predictions while ignoring spurious ones. We find that VISION reduces spurious learning and enables more robust, generalizable detection, improving overall accuracy (from 51.8% to 97.8%), pairwise contrast accuracy (from 4.5% to 95.8%), and worst-group accuracy (from 0.7% to 85.5%) on the Common Weakness Enumeration (CWE)-20 vulnerability. We further demonstrate gains using proposed metrics: intra-class attribution variance, inter-class attribution distance, and node score dependency. We also release CWE-20-CFA, a benchmark of 27,556 functions (real and counterfactual) from the high-impact CWE-20 category. Finally, VISION advances transparent and trustworthy AI-based cybersecurity systems through interactive visualization for human-in-the-loop analysis.

Problem

Research questions and friction points this paper is trying to address.

Detect code vulnerabilities robustly using counterfactual augmentation

Mitigate spurious correlations in graph neural networks training

Improve interpretability and generalization of vulnerability detection systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Counterfactual augmentation using Large Language Models

Targeted GNN training on paired code examples

Graph-based interpretability to identify crucial statements

🔎 Similar Papers

No similar papers found.