Code Vulnerability Repair with Large Language Model using Context-Aware Prompt Tuning

📅 2024-09-27
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the “repair-detection imbalance” problem in large language models (LLMs), where buffer overflow (BOF) vulnerability detection achieves high accuracy (76%) but repair success remains extremely low (15%). We propose the first multi-granularity, domain-knowledge-injected prompt tuning framework specifically designed for BOF repair. Our method employs context-aware prompt engineering to serialize and inject critical security-domain knowledge—including security semantics, code dataflow, and syntactic/structural constraints—directly into LLM inputs, enabling precise modeling of security-sensitive code contexts. Evaluated on GitHub Copilot, our approach boosts BOF repair rate to 63%, a 4.2× improvement over baseline, while preserving the original 76% detection rate. This study is the first to systematically identify and bridge the capability gap of LLMs in vulnerability repair tasks, empirically demonstrating that domain-knowledge-guided prompting is a pivotal pathway to enhancing secure code repair performance.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have shown significant challenges in detecting and repairing vulnerable code, particularly when dealing with vulnerabilities involving multiple aspects, such as variables, code flows, and code structures. In this study, we utilize GitHub Copilot as the LLM and focus on buffer overflow vulnerabilities. Our experiments reveal a notable gap in Copilot's abilities when dealing with buffer overflow vulnerabilities, with a 76% vulnerability detection rate but only a 15% vulnerability repair rate. To address this issue, we propose context-aware prompt tuning techniques designed to enhance LLM performance in repairing buffer overflow. By injecting a sequence of domain knowledge about the vulnerability, including various security and code contexts, we demonstrate that Copilot's successful repair rate increases to 63%, representing more than four times the improvement compared to repairs without domain knowledge.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM performance in buffer overflow vulnerability repair
Addressing low repair rates in GitHub Copilot for buffer overflow
Improving context-aware prompt tuning with domain knowledge injection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Context-aware prompt tuning for LLMs
Domain knowledge injection for vulnerability repair
Buffer overflow repair enhancement
A
Arshiya Khan
University of Delaware, Newark, Delaware, USA
Guannan Liu
Guannan Liu
Associate Professor, Beihang University
data miningbusiness intelligence
X
Xing Gao
University of Delaware, Newark, Delaware, USA