🤖 AI Summary
This work addresses the “repair-detection imbalance” problem in large language models (LLMs), where buffer overflow (BOF) vulnerability detection achieves high accuracy (76%) but repair success remains extremely low (15%). We propose the first multi-granularity, domain-knowledge-injected prompt tuning framework specifically designed for BOF repair. Our method employs context-aware prompt engineering to serialize and inject critical security-domain knowledge—including security semantics, code dataflow, and syntactic/structural constraints—directly into LLM inputs, enabling precise modeling of security-sensitive code contexts. Evaluated on GitHub Copilot, our approach boosts BOF repair rate to 63%, a 4.2× improvement over baseline, while preserving the original 76% detection rate. This study is the first to systematically identify and bridge the capability gap of LLMs in vulnerability repair tasks, empirically demonstrating that domain-knowledge-guided prompting is a pivotal pathway to enhancing secure code repair performance.
📝 Abstract
Large Language Models (LLMs) have shown significant challenges in detecting and repairing vulnerable code, particularly when dealing with vulnerabilities involving multiple aspects, such as variables, code flows, and code structures. In this study, we utilize GitHub Copilot as the LLM and focus on buffer overflow vulnerabilities. Our experiments reveal a notable gap in Copilot's abilities when dealing with buffer overflow vulnerabilities, with a 76% vulnerability detection rate but only a 15% vulnerability repair rate. To address this issue, we propose context-aware prompt tuning techniques designed to enhance LLM performance in repairing buffer overflow. By injecting a sequence of domain knowledge about the vulnerability, including various security and code contexts, we demonstrate that Copilot's successful repair rate increases to 63%, representing more than four times the improvement compared to repairs without domain knowledge.