NAVRepair: Node-type Aware C/C++ Code Vulnerability Repair

📅 2024-05-08

🏛️ arXiv.org

📈 Citations: 7

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Existing LLM-based approaches for C/C++ vulnerability repair overlook syntactic vulnerability patterns and lack type-specific prompting. To address this, we propose an AST- and CWE-aware repair framework that employs type-driven minimal-edit node localization and generates context-aware prompts customized along both vulnerability and AST-node dimensions, enabling high-precision patch generation. Our key contributions are: (1) the first joint modeling mechanism integrating AST node types with CWE vulnerability categories; (2) an LLM-agnostic, plug-and-play modular architecture, facilitating rapid adaptation to novel vulnerability types; and (3) a lightweight prompt engineering paradigm unifying AST parsing, static type analysis, and CWE template matching. Evaluated on mainstream LLMs, our method achieves a 26% absolute improvement in repair accuracy over state-of-the-art baselines, demonstrating significantly enhanced generalizability and practical applicability.

Technology Category

Application Category

📝 Abstract

The rapid advancement of deep learning has led to the development of Large Language Models (LLMs). In the field of vulnerability repair, previous research has leveraged rule-based fixing, pre-trained models, and LLM's prompt engineering. However, existing approaches have limitations in terms of the integration of code structure with error types. Besides, due to certain features of C/C++ language, vulnerability repair in C/C++ proves to be exceptionally challenging. To address these challenges, we propose NAVRepair, a novel framework that combines the node-type information extracted from Abstract Syntax Trees (ASTs) with error types, specifically targeting C/C++ vulnerabilities. Specifically, our approach employs type analysis to localize the minimum edit node (MEN) and customizes context information collection based on different error types. In the offline stage, NAVRepair parses code patches to locate MENs and designs rules to extract relevant contextual information for each MEN type. In the online repairing stage, it analyzes the suspicious code, combines it with vulnerability type templates derived from the Common Weakness Enumeration (CWE), and generates targeted repair prompts. We evaluate NAVRepair on multiple popular LLMs and demonstrate its effectiveness in improving the performance of code vulnerability repair. Notably, our framework is independent of any specific LLMs and can quickly adapt to new vulnerability types. Extensive experiments validate that NAVRepair achieves excellent results in assisting LLMs to accurately detect and fix C/C++ vulnerabilities. We achieve a 26% higher accuracy compared to an existing LLM-based C/C++ vulnerability repair method. We believe our node type-aware approach has promising application prospects for enhancing real-world C/C++ code security.

Problem

Research questions and friction points this paper is trying to address.

Improves LLM-based vulnerability repair accuracy using syntax patterns

Generates targeted prompts from code syntax trees for specific vulnerabilities

Enhances patch generation by incorporating CWE descriptions into prompts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages syntax trees to generate targeted prompts

Incorporates CWE descriptions for vulnerability context

Uses conversational LLMs for automated patch generation

🔎 Similar Papers

How Well Do Large Language Models Serve as End-to-End Secure Code Producers?