🤖 AI Summary
This study systematically reviews 63 LLM-based automated program repair (APR) systems published between January 2022 and June 2025, addressing three core challenges: semantic correctness verification beyond test suites, large-scale repository-level defect repair, and optimization of LLM inference cost. We propose the first comprehensive taxonomy, categorizing APR designs into four paradigms: fine-tuning, prompt engineering, pipeline-based workflows, and agent frameworks. Quantitative analysis demonstrates how retrieval augmentation and static/dynamic code analysis enhance context quality, while revealing fundamental trade-offs among cost, controllability, and scalability across paradigms. Key insights identify lightweight feedback mechanisms, repository-aware retrieval, and cost-aware planning as critical levers for advancement. The work establishes a theoretical framework and practical roadmap to enhance the reliability, scalability, and real-world applicability of LLM-APR systems.
📝 Abstract
Large language models (LLMs) are reshaping automated program repair (APR). We categorize the recent 63 LLM-based APR systems published from January 2022 to June 2025 into four paradigms, and show how retrieval- or analysis-augmented contexts strengthen any of them. This taxonomy clarifies key trade-offs: fine-tuning delivers strong task alignment at high training cost; prompting enables rapid deployment but is limited by prompt design and context windows; procedural pipelines offer reproducible control with moderate overhead; agentic frameworks tackle multi-hunk or cross-file bugs at the price of increased latency and complexity. Persistent challenges include verifying semantic correctness beyond test suites, repairing repository-scale defects, and lowering the costs of LLMs. We outline research directions that combine lightweight human feedback, repository-aware retrieval, code analysis, and cost-aware planning to advance reliable and efficient LLM-based APR.