Fixseeker: An Empirical Driven Graph-based Approach for Detecting Silent Vulnerability Fixes in Open Source Software

📅 2025-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vulnerability-fixing commits (VFCs) in open-source software often lack security annotations—termed “silent commits”—making them difficult to detect. Existing approaches either rely on unreliable commit messages or neglect semantic and data-flow correlations across code hunks, limiting performance. Method: We propose the first heterogeneous change graph neural network (HCGNN) tailored for VFC detection. Grounded in an empirical study revealing that >70% of VFCs involve strongly correlated multi-hunk changes, we construct a cross-hunk heterogeneous change graph integrating abstract syntax trees (ASTs), control-flow graphs (CFGs), and data-flow dependencies. A multi-granularity GNN encoder models inter-hunk correlations. Results: Our model achieves an F1-score of 0.8404 on balanced data and improves F1 by 32.40% on imbalanced data, with significant gains in AUC-ROC and AUC-PR. It demonstrates strong generalization across repositories of varying scales and complex, real-world commits.

Technology Category

Application Category

📝 Abstract
Open source software vulnerabilities pose significant security risks to downstream applications. While vulnerability databases provide valuable information for mitigation, many security patches are released silently in new commits of OSS repositories without explicit indications of their security impact. This makes it challenging for software maintainers and users to detect and address these vulnerability fixes. There are a few approaches for detecting vulnerability-fixing commits (VFCs) but most of these approaches leverage commit messages, which would miss silent VFCs. On the other hand, there are some approaches for detecting silent VFCs based on code change patterns but they often fail to adequately characterize vulnerability fix patterns, thereby lacking effectiveness. For example, some approaches analyze each hunk in known VFCs, in isolation, to learn vulnerability fix patterns; but vulnerabiliy fixes are often associated with multiple hunks, in which cases correlations of code changes across those hunks are essential for characterizing the vulnerability fixes. To address these problems, we first conduct a large-scale empirical study on 11,900 VFCs across six programming languages, in which we found that over 70% of VFCs involve multiple hunks with various types of correlations. Based on our findings, we propose Fixseeker, a graph-based approach that extracts the various correlations between code changes at the hunk level to detect silent vulnerability fixes. Our evaluation demonstrates that Fixseeker outperforms state-of-the-art approaches across multiple programming languages, achieving a high F1 score of 0.8404 on average in balanced datasets and consistently improving F1 score, AUC-ROC and AUC-PR scores by 32.40%, 1.55% and 8.24% on imbalanced datasets. Our evaluation also indicates the generality of Fixseeker across different repository sizes and commit complexities.
Problem

Research questions and friction points this paper is trying to address.

Detecting silent vulnerability fixes in open source software
Addressing limitations of existing VFC detection methods
Improving accuracy in identifying multi-hunk vulnerability fixes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based approach for silent vulnerability fixes
Extracts correlations between code changes
Outperforms state-of-the-art detection methods
🔎 Similar Papers
No similar papers found.
Yiran Cheng
Yiran Cheng
Chinese Academy of Sciences University
T
Ting Zhang
Singapore Management University, Singapore
L
Lwin Khin Shar
Singapore Management University, Singapore
Z
Zhe Lang
Beijing Key Laboratory of IOT Information Security Technology, Institute of Information Engineering, CAS; School of Cyber Security, University of Chinese Academy of Sciences, China
D
David Lo
Singapore Management University, Singapore
S
Shichao Lv
Beijing Key Laboratory of IOT Information Security Technology, Institute of Information Engineering, CAS; School of Cyber Security, University of Chinese Academy of Sciences, China
D
Dongliang Fang
Beijing Key Laboratory of IOT Information Security Technology, Institute of Information Engineering, CAS; School of Cyber Security, University of Chinese Academy of Sciences, China
Z
Zhiqiang Shi
Beijing Key Laboratory of IOT Information Security Technology, Institute of Information Engineering, CAS; School of Cyber Security, University of Chinese Academy of Sciences, China
L
Limin Sun
Beijing Key Laboratory of IOT Information Security Technology, Institute of Information Engineering, CAS; School of Cyber Security, University of Chinese Academy of Sciences, China