CRPE: Expanding The Reasoning Capability of Large Language Model for Code Generation

📅 2025-05-15

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language models (LLMs) exhibit limited logical reasoning and analytical capabilities in code generation tasks. Method: This paper proposes CRPE, a three-stage framework built upon Qwen2.5-Coder, integrating chain-of-thought (CoT) modeling, StepDPO reinforcement learning, and multi-stage synthetic data distillation. It introduces a novel, code-reasoning–oriented autonomous data synthesis and closed-loop enhancement mechanism, enabling an end-to-end open-source pipeline—from instruction parsing and expert-level reasoning data construction to iterative model capability refinement. Contribution/Results: Experiments demonstrate that COT-Coder-7B-StepDPO achieves a pass@1 score of 21.88 on LiveCodeBench, outperforming same-scale models; COT-Coder-32B-StepDPO attains 35.08, surpassing GPT-4o. Both variants significantly advance code-level logical reasoning performance.

Technology Category

Application Category

📝 Abstract

We introduce CRPE (Code Reasoning Process Enhancer), an innovative three-stage framework for data synthesis and model training that advances the development of sophisticated code reasoning capabilities in large language models (LLMs). Building upon existing system-1 models, CRPE addresses the fundamental challenge of enhancing LLMs' analytical and logical processing in code generation tasks. Our framework presents a methodologically rigorous yet implementable approach to cultivating advanced code reasoning abilities in language models. Through the implementation of CRPE, we successfully develop an enhanced COT-Coder that demonstrates marked improvements in code generation tasks. Evaluation results on LiveCodeBench (20240701-20240901) demonstrate that our COT-Coder-7B-StepDPO, derived from Qwen2.5-Coder-7B-Base, with a pass@1 accuracy of 21.88, exceeds all models with similar or even larger sizes. Furthermore, our COT-Coder-32B-StepDPO, based on Qwen2.5-Coder-32B-Base, exhibits superior performance with a pass@1 accuracy of 35.08, outperforming GPT4O on the benchmark. Overall, CRPE represents a comprehensive, open-source method that encompasses the complete pipeline from instruction data acquisition through expert code reasoning data synthesis, culminating in an autonomous reasoning enhancement mechanism.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' analytical and logical processing in code generation

Developing advanced code reasoning abilities in language models

Improving pass@1 accuracy in code generation benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Three-stage framework for code reasoning enhancement

Advanced COT-Coder with improved code generation

Open-source pipeline from data to reasoning mechanism

🔎 Similar Papers

No similar papers found.

Authors to Follow