RefineCoder: Iterative Improving of Large Language Models via Adaptive Critique Refinement for Code Generation

📅 2025-02-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Code generation methods relying on teacher-model distillation and static fine-tuning suffer from poor generalization and delayed feedback. Method: This paper proposes the Adaptive Critique Refinement (ACR) framework—a teacher-free, closed-loop self-evolution paradigm. ACR integrates LLM-as-a-Judge quality assessment with LLM-as-a-Critic selective critique, employing a composite scoring system to identify error patterns and drive iterative supervised fine-tuning. Crucially, it eliminates reliance on external annotations, instead leveraging only model-generated samples and external execution/semantic feedback for continuous refinement. Contribution/Results: On multiple benchmarks, the RefineCoder series—built upon ACR—significantly outperforms same-scale baselines using substantially less training data. These results empirically validate the effectiveness and scalability of self-iterative optimization for code generation.

Technology Category

Application Category

📝 Abstract
Code generation has attracted increasing attention with the rise of Large Language Models (LLMs). Many studies have developed powerful code LLMs by synthesizing code-related instruction data and applying supervised fine-tuning. However, these methods are limited by teacher model distillation and ignore the potential of iterative refinement by self-generated code. In this paper, we propose Adaptive Critique Refinement (ACR), which enables the model to refine itself by self-generated code and external critique, rather than directly imitating the code responses of the teacher model. Concretely, ACR includes a composite scoring system with LLM-as-a-Judge to evaluate the quality of code responses and a selective critique strategy with LLM-as-a-Critic to critique self-generated low-quality code responses. We develop the RefineCoder series by iteratively applying ACR, achieving continuous performance improvement on multiple code generation benchmarks. Compared to the baselines of the same size, our proposed RefineCoder series can achieve comparable or even superior performance using less data.
Problem

Research questions and friction points this paper is trying to address.

Improving code generation via iterative refinement
Adaptive critique for self-generated code enhancement
LLM-as-a-Judge for code quality evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Critique Refinement (ACR)
LLM-as-a-Judge scoring system
Selective critique strategy
C
Changzhi Zhou
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
X
Xinyu Zhang
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
D
Dandan Song
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
X
Xiancai Chen
Peking University
W
Wanli Gu
Meituan
H
Huipeng Ma
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Y
Yuhang Tian
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
M
Mengdi Zhang
Meituan
Linmei Hu
Linmei Hu
Beijing Institute of Technology
Large Language ModelsKnowledge GraphMulitimodal