Input Reduction Enhanced LLM-based Program Repair

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from the “lost-in-the-middle” problem in automated program repair (APR) when processing lengthy test inputs, leading to critical failure information loss and degraded repair performance. To address this, we propose ReduceFix—a novel APR framework that introduces an LLM-driven automated test input reduction mechanism specifically designed to preserve failure-triggering input segments. Our key contributions include: (1) the first benchmark for long-test-input APR, LFTBench; and (2) integration of reduced test inputs into repair prompts to enhance fault localization and patch generation. Experiments on LFTBench show that ReduceFix achieves an average 89.1% test input reduction, improves pass@10 by 53.8% over full-input baselines and by 17.6% over no-test-input baselines, and boosts ChatRepair’s repair rate by 21.3%.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have shown great potential in Automated Program Repair (APR). Test inputs, being crucial for reasoning the root cause of failures, are always included in the prompt for LLM-based APR. Unfortunately, LLMs struggle to retain key information in long prompts. When the test inputs are extensive in the prompt, this may trigger the "lost-in-the-middle" issue, compromising repair performance. To address this, we propose ReduceFix, an LLM-based APR approach with a built-in component that automatically reduces test inputs while retaining their failure-inducing behavior. ReduceFix prompts an LLM to generate a reducer that minimizes failure-inducing test inputs without human effort, and then feeds the reduced failure-inducing inputs to guide patch generation. For targeted evaluation, we constructed LFTBench, the first long-input APR benchmark with 200 real bugs from 20 programming tasks, each paired with a failure-inducing input whose median size is 1 MB. On this benchmark, ReduceFix shrinks inputs by 89.1% on average and improves overall pass@10 by up to 53.8% relative to a prompt that includes the original test, and by 17.6% compared with omitting the test entirely. Adding the same reduction step to ChatRepair increases its fix rate by 21.3% without other changes. Ablation studies further highlight the impact of input length and compressed failure information on repair success. These results underscore that automatically reducing failing inputs is a practical and powerful complement to LLM-based APR, significantly improving its scalability and effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Reduces long test inputs to prevent LLM performance loss
Automates failure-inducing test input minimization for APR
Improves program repair accuracy with compressed input data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically reduces test inputs retaining failure behavior
Generates reducer to minimize failure-inducing test inputs
Improves LLM-based APR scalability and effectiveness significantly
🔎 Similar Papers