FLARE: Fine-Grained Diagnostic Feedback for LLM Code Refinement

📅 2026-06-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

172K/year
🤖 AI Summary
This work addresses the challenge that code generated by large language models often contains defects, yet existing feedback signals are too coarse-grained to precisely localize errors. To this end, the authors propose FLARE, a novel framework that introduces lightweight line-level suspiciousness prediction and a multi-candidate region search mechanism for the first time. By leveraging execution outcomes to rank and select among repair candidates, FLARE enables fine-grained, iterative code refinement. Experimental results on LiveCodeBench and BigCodeBench demonstrate that FLARE outperforms the strongest baseline by 1.72%–7.42% on average. Moreover, employing a 10-candidate search strategy yields an additional 8.50% improvement over single-candidate approaches, substantially enhancing the accuracy and effectiveness of code repair.
📝 Abstract
Large language models often generate code with bugs. Existing methods rely on feedback signals such as test failures and self-critiques to iteratively refine the generated code. Such signals are either too coarse-grained or too high-level, which is not sufficient to inform the model where to fix the bug. In this work, we present Flare, an iterative framework with a lightweight diagnostic model that predicts line-level suspiciousness signals for bug localization and code refinement. Given the inherent uncertainty of diagnostic predictions, Flare searches over the top-k suspicious regions and selects the best candidate according to execution outcomes. Experiments on LiveCodeBench and BigCodeBench with five base LLMs show that, even without candidate search (k=1), Flare outperforms the strongest baseline with an absolute improvement from 1.72% to 7.42%. Furthermore, searching over 10 candidates yields an average improvement of 8.50% compared with no candidate search. When evaluated in isolation, our lightweight diagnostic model achieves the best performance compared with recent fault localization methods, demonstrating that it can provide reliable fine-grained guidance for code refinement.
Problem

Research questions and friction points this paper is trying to address.

bug localization
code refinement
diagnostic feedback
large language models
fine-grained feedback
Innovation

Methods, ideas, or system contributions that make the work stand out.

fine-grained feedback
bug localization
code refinement
diagnostic model
LLM code generation