🤖 AI Summary
Existing large language models (LLMs) exhibit weak self-debugging capabilities in complex code generation, especially among small open-source models.
Method: We propose an explanation-guided multi-step self-debugging framework. It constructs “generate–explain–correct” execution-feedback trajectories; introduces the first explanation-driven joint supervised fine-tuning and reinforcement learning (RL) paradigm; designs a composite reward function integrating explanation accuracy and correction correctness; and employs execution-based filtering and teacher distillation to enhance data quality.
Results: On four major benchmarks, the framework achieves up to a 15.92% absolute improvement in pass@1; RL further boosts performance by 3.54%. Human evaluation confirms that generated explanations are more accurate, interpretable, and actionable for debugging—demonstrating significant advances in both code correctness and explainable self-correction.
📝 Abstract
In the domain of code generation, self-debugging is crucial. It allows LLMs to refine their generated code based on execution feedback. This is particularly important because generating correct solutions in one attempt proves challenging for complex tasks. Prior works on self-debugging mostly focus on prompting methods by providing LLMs with few-shot examples, which work poorly on small open-sourced LLMs. In this work, we propose LeDex, a training framework that significantly improves the self-debugging capability of LLMs. Intuitively, we observe that a chain of explanations on the wrong code followed by code refinement helps LLMs better analyze the wrong code and do refinement. We thus propose an automated pipeline to collect a high-quality dataset for code explanation and refinement by generating a number of explanations and refinement trajectories from the LLM itself or a larger teacher model and filtering via execution verification. We perform supervised fine-tuning (SFT) and further reinforcement learning (RL) on both success and failure trajectories with a novel reward design considering code explanation and refinement quality. SFT improves the pass@1 by up to 15.92% and pass@10 by 9.30% over four benchmarks. RL training brings additional up to 3.54% improvement on pass@1 and 2.55% improvement on pass@10. The trained LLMs show iterative refinement ability and can keep refining code continuously. Lastly, our human evaluation shows that the LLMs trained with our framework generate more useful code explanations and help developers better understand bugs in source code.