FusionVul: A Multimodal Feature Fusion Framework for Source Code Vulnerability Detection

📅 2026-06-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vulnerability detection methods struggle to effectively model fine-grained vulnerability patterns in modern codebases due to their large scale, structural complexity, and semantic diversity. This work proposes a multimodal fusion framework that, for the first time in vulnerability detection, introduces fine-grained cross-attention interactions between sequential modality (based on pretrained Transformers) and graph-structured modality (based on graph neural networks). Furthermore, it incorporates a sample-aware, multi-branch weighted ensemble mechanism to enhance generalization across diverse vulnerability types. Experimental results demonstrate that the proposed approach achieves state-of-the-art F1 scores on benchmarks such as SVulD and DiverseVul, significantly outperforming existing methods—particularly in scenarios characterized by highly dispersed function size distributions and a broad spectrum of vulnerability categories.
📝 Abstract
Source code vulnerability detection remains a long-standing challenge due to the increasing scale, structural complexity, and semantic diversity of modern codebases. Conventional static-analysis or rule-based approaches often fail to capture subtle execution dependencies, while single-modality learning models tend to overlook critical structural information embedded beyond the lexical surface of source code. To improve robustness across heterogeneous code patterns, we propose FusionVul, a joint representation learning framework that integrates sequential syntactic representations extracted by a pretrained Transformer encoder with structural semantics propagated through a graph neural network. The framework further incorporates a cross-attention-based feature fusion network to enable fine-grained cross-modal interaction and employs a sample-aware weighting mechanism to integrate multiple predictive branches. Experimental results on four datasets demonstrate that FusionVul achieves superior F1 scores on datasets with highly dispersed function size distributions and broader vulnerability-type coverage, such as SVulD and DiverseVul, reflecting its capability to capture complex and diverse vulnerability patterns.
Problem

Research questions and friction points this paper is trying to address.

source code vulnerability detection
structural complexity
semantic diversity
multimodal learning
code representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal fusion
cross-attention
graph neural network
pretrained Transformer
vulnerability detection
🔎 Similar Papers
No similar papers found.
H
Hongyu Yang
School of Safety Science and Engineering, Civil Aviation University of China, Tianjin, 300300, China
Y
Yaping Zhu
School of Safety Science and Engineering, Civil Aviation University of China, Tianjin, 300300, China
J
Jingchuan Luo
School of Safety Science and Engineering, Civil Aviation University of China, Tianjin, 300300, China
H
Hiroshi Nomaguchi
School of Computer Science and Engineering, University of Aizu, Aizu-Wakamatsu, 965-8580, Fukushima, Japan
Chunhua Su
Chunhua Su
Division of Computer Science, University of Aizu, Japan
Cyber SecurityCryptographyData PrivacyIoT
W
Willy Susilo
School of Computing and Information Technology, University of Wollongong, Wollongong, 2522, NSW, Australia