Accelerating Automatic Program Repair with Dual Retrieval-Augmented Fine-Tuning and Patch Generation on Large Language Models

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing LLM- and RAG-based automated program repair (APR) methods suffer from limited defect-type coverage, suboptimal training data quality, poor model adaptability, and—critically—neglect of the dual nature of code: semantics and syntax/structure. To address these limitations, we propose a dual-channel retrieval-augmented fine-tuning framework. It comprises semantically and syntactically/structurally aligned retrieval modules, coupled with a retrieval selection gate that dynamically filters highly relevant repair knowledge—thereby substantially reducing input length and inference overhead. Our method jointly leverages bug-fix pair supervision and code embedding similarity matching to enable end-to-end patch generation. Evaluated on a Java benchmark, our approach achieves a state-of-the-art exact match rate of 26.29%, outperforming prior work by a significant margin, while reducing inference latency by ≥6.42%. These results demonstrate its effectiveness in simultaneously optimizing both repair accuracy and computational efficiency.

Technology Category

Application Category

📝 Abstract

Automated Program Repair (APR) is essential for ensuring software reliability and quality while enhancing efficiency and reducing developers' workload. Although rule-based and learning-based APR methods have demonstrated their effectiveness, their performance was constrained by the defect type of repair, the quality of training data, and the size of model parameters. Recently, Large Language Models (LLMs) combined with Retrieval-Augmented-Generation (RAG) have been increasingly adopted in APR tasks. However, current code LLMs and RAG designs neither fully address code repair tasks nor consider code-specific features. To overcome these limitations, we propose SelRepair, a novel APR approach with integration of a fine-tuned LLM with a newly-designed dual RAG module. This approach uses a bug-fix pair dataset for fine-tuning and incorporates semantic and syntactic/structural similarity information through an RAG selection gate. This design ensures relevant information is retrieved efficiently, thereby reducing token length and inference time. Evaluations on Java datasets show SelRepair outperforms other APR methods, achieving 26.29% and 17.64% in terms of exact match (EM) on different datasets while reducing inference time by at least 6.42% with controlled input lengths.

Problem

Research questions and friction points this paper is trying to address.

Improves APR performance with dual RAG and fine-tuning

Addresses code-specific features in retrieval-augmented APR

Reduces inference time while enhancing repair accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual RAG module for semantic and syntactic retrieval

Fine-tuned LLM with bug-fix pair dataset

RAG selection gate reduces token length

🔎 Similar Papers

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair