VulRTex: A Reasoning-Guided Approach to Identify Vulnerabilities from Rich-Text Issue Report

📅 2025-09-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low vulnerability identification efficiency and the neglect of formatting information in rich-text issue reports for open-source software, this paper proposes a reasoning-guided vulnerability identification method. It integrates the logical reasoning capabilities of large language models (LLMs) with structured rich-text analysis to construct a vulnerability reasoning database and dynamically generate reasoning guidance via historical case retrieval. This work is the first to achieve synergistic modeling of LLM-based reasoning and rich-text features, significantly improving identification performance on imbalanced data. Experiments on 970,000 issue reports demonstrate improvements of 11.0% in F1-score, 10.5% in Macro-F1, and 20.2% in AUPRC, while halving inference time. In real-world deployment, the method successfully identified 30 emerging vulnerabilities, 11 of which received official CVE assignments.

Technology Category

Application Category

📝 Abstract
Software vulnerabilities exist in open-source software (OSS), and the developers who discover these vulnerabilities may submit issue reports (IRs) to describe their details. Security practitioners need to spend a lot of time manually identifying vulnerability-related IRs from the community, and the time gap may be exploited by attackers to harm the system. Previously, researchers have proposed automatic approaches to facilitate identifying these vulnerability-related IRs, but these works focus on textual descriptions but lack the comprehensive analysis of IR's rich-text information. In this paper, we propose VulRTex, a reasoning-guided approach to identify vulnerability-related IRs with their rich-text information. In particular, VulRTex first utilizes the reasoning ability of the Large Language Model (LLM) to prepare the Vulnerability Reasoning Database with historical IRs. Then, it retrieves the relevant cases from the prepared reasoning database to generate reasoning guidance, which guides LLM to identify vulnerabilities by reasoning analysis on target IRs' rich-text information. To evaluate the performance of VulRTex, we conduct experiments on 973,572 IRs, and the results show that VulRTex achieves the highest performance in identifying the vulnerability-related IRs and predicting CWE-IDs when the dataset is imbalanced, outperforming the best baseline with +11.0% F1, +20.2% AUPRC, and +10.5% Macro-F1, and 2x lower time cost than baseline reasoning approaches. Furthermore, VulRTex has been applied to identify 30 emerging vulnerabilities across 10 representative OSS projects in 2024's GitHub IRs, and 11 of them are successfully assigned CVE-IDs, which illustrates VulRTex's practicality.
Problem

Research questions and friction points this paper is trying to address.

Automatically identify vulnerability-related issue reports from rich-text data
Overcome limitations of prior text-only vulnerability detection methods
Reduce manual effort and time gap in security vulnerability identification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLM reasoning to create vulnerability database
Retrieves relevant cases to guide vulnerability identification
Analyzes rich-text information in issue reports
🔎 Similar Papers
No similar papers found.
Ziyou Jiang
Ziyou Jiang
Institute of Software Chinese Academy of Sciences
software engineering
M
Mingyang Li
State Key Laboratory of Intelligent Game, China, Science and Technology on Integrated Information System Laboratory, Institute of Software Chinese Academy of Sciences, China, and University of Chinese Academy of Sciences, China
Guowei Yang
Guowei Yang
The University of Queensland
Software engineeringProgram analysisMobile softwareAI4SESE4AI
Lin Shi
Lin Shi
Beihang University
Software Engineering
Q
Qing Wang
State Key Laboratory of Intelligent Game, China, Science and Technology on Integrated Information System Laboratory, Institute of Software Chinese Academy of Sciences, China, and University of Chinese Academy of Sciences, China