🤖 AI Summary
Existing vulnerability detection methods struggle to effectively model fine-grained vulnerability patterns in modern codebases due to their large scale, structural complexity, and semantic diversity. This work proposes a multimodal fusion framework that, for the first time in vulnerability detection, introduces fine-grained cross-attention interactions between sequential modality (based on pretrained Transformers) and graph-structured modality (based on graph neural networks). Furthermore, it incorporates a sample-aware, multi-branch weighted ensemble mechanism to enhance generalization across diverse vulnerability types. Experimental results demonstrate that the proposed approach achieves state-of-the-art F1 scores on benchmarks such as SVulD and DiverseVul, significantly outperforming existing methods—particularly in scenarios characterized by highly dispersed function size distributions and a broad spectrum of vulnerability categories.
📝 Abstract
Source code vulnerability detection remains a long-standing challenge due to the increasing scale, structural complexity, and semantic diversity of modern codebases. Conventional static-analysis or rule-based approaches often fail to capture subtle execution dependencies, while single-modality learning models tend to overlook critical structural information embedded beyond the lexical surface of source code. To improve robustness across heterogeneous code patterns, we propose FusionVul, a joint representation learning framework that integrates sequential syntactic representations extracted by a pretrained Transformer encoder with structural semantics propagated through a graph neural network. The framework further incorporates a cross-attention-based feature fusion network to enable fine-grained cross-modal interaction and employs a sample-aware weighting mechanism to integrate multiple predictive branches. Experimental results on four datasets demonstrate that FusionVul achieves superior F1 scores on datasets with highly dispersed function size distributions and broader vulnerability-type coverage, such as SVulD and DiverseVul, reflecting its capability to capture complex and diverse vulnerability patterns.