Detecting Code Vulnerabilities with Heterogeneous GNN Training

📅 2025-02-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing source-code vulnerability detection methods suffer from coarse-grained relational modeling, neglect of node heterogeneity, and high false-positive rates. To address these issues, this paper proposes a language-agnostic inter-procedural abstract graph (IPAG) representation and designs a heterogeneous attention graph neural network (HAGNN). HAGNN is the first model to jointly integrate parallel multi-subgraph learning with a global attention mechanism, explicitly encoding semantic distinctions among edge types to enable fine-grained, context-aware vulnerability localization. Evaluated on large-scale real-world datasets—comprising 108 vulnerability types in C and 114 in Java—HAGNN achieves accuracy rates of 96.6% and 97.8%, respectively, substantially outperforming state-of-the-art approaches. Furthermore, empirical validation on real-world open-source projects confirms its robustness, demonstrating consistently low false-positive rates. These results validate both the effectiveness and practical applicability of the proposed method.

Technology Category

Application Category

📝 Abstract

Detecting vulnerabilities in source code is a critical task for software security assurance. Graph Neural Network (GNN) machine learning can be a promising approach by modeling source code as graphs. Early approaches treated code elements uniformly, limiting their capacity to model diverse relationships that contribute to various vulnerabilities. Recent research addresses this limitation by considering the heterogeneity of node types and using Gated Graph Neural Networks (GGNN) to aggregate node information through different edge types. However, these edges primarily function as conduits for passing node information and may not capture detailed characteristics of distinct edge types. This paper presents Inter-Procedural Abstract Graphs (IPAGs) as an efficient, language-agnostic representation of source code, complemented by heterogeneous GNN training for vulnerability prediction. IPAGs capture the structural and contextual properties of code elements and their relationships. We also propose a Heterogeneous Attention GNN (HAGNN) model that incorporates multiple subgraphs capturing different features of source code. These subgraphs are learned separately and combined using a global attention mechanism, followed by a fully connected neural network for final classification. The proposed approach has achieved up to 96.6% accuracy on a large C dataset of 108 vulnerability types and 97.8% on a large Java dataset of 114 vulnerability types, outperforming state-of-the-art methods. Its applications to various real-world software projects have also demonstrated low false positive rates.

Problem

Research questions and friction points this paper is trying to address.

Detect vulnerabilities in source code

Model source code as heterogeneous graphs

Improve accuracy in vulnerability prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous GNN Training

Inter-Procedural Abstract Graphs

Heterogeneous Attention GNN Model

🔎 Similar Papers

No similar papers found.

Authors to Follow