IterPref: Focal Preference Learning for Code Generation via Iterative Debugging

📅 2025-03-04

📈 Citations: 0

✨ Influential: 0

career value

142K/year

🤖 AI Summary

Existing code generation models rely on coarse-grained preference pairs based solely on test pass/fail signals, hindering precise error localization and limiting debugging capability. Method: We propose a fine-grained preference alignment paradigm that emulates human iterative debugging—enabling token-level error detection and modeling repair trajectories; introduce CodeFlow, the first dataset explicitly capturing “error–repair” evolution; and design a syntax- and semantics-aware customized DPO algorithm. Our approach integrates test-feedback-driven progressive sample generation, iterative error correction modeling, and fine-grained alignment optimization. Contribution/Results: On challenging benchmarks including BigCodeBench, our method significantly reduces error rates and consistently improves generation quality, generalization, and interpretability across diverse Code LLMs.

Technology Category

Application Category

📝 Abstract

Preference learning enhances Code LLMs beyond supervised fine-tuning by leveraging relative quality comparisons. Existing methods construct preference pairs from candidates based on test case success, treating the higher pass rate sample as positive and the lower as negative. However, this approach does not pinpoint specific errors in the code, which prevents the model from learning more informative error correction patterns, as aligning failing code as a whole lacks the granularity needed to capture meaningful error-resolution relationships. To address these issues, we propose IterPref, a new preference alignment framework that mimics human iterative debugging to refine Code LLMs. IterPref explicitly locates error regions and aligns the corresponding tokens via a tailored DPO algorithm. To generate informative pairs, we introduce the CodeFlow dataset, where samples are iteratively refined until passing tests, with modifications capturing error corrections. Extensive experiments show that a diverse suite of Code LLMs equipped with IterPref achieves significant performance gains in code generation and improves on challenging tasks like BigCodeBench. In-depth analysis reveals that IterPref yields fewer errors. Our code and data will be made publicaly available.

Problem

Research questions and friction points this paper is trying to address.

Enhances code generation via iterative debugging.

Locates specific errors in code for better learning.

Improves performance on challenging code generation tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

IterPref framework mimics iterative human debugging

Tailored DPO algorithm aligns error-specific tokens

CodeFlow dataset captures iterative error corrections

🔎 Similar Papers

Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency