CRPE: Expanding The Reasoning Capability of Large Language Model for Code Generation

📅 2025-05-15
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit limited logical reasoning and analytical capabilities in code generation tasks. Method: This paper proposes CRPE, a three-stage framework built upon Qwen2.5-Coder, integrating chain-of-thought (CoT) modeling, StepDPO reinforcement learning, and multi-stage synthetic data distillation. It introduces a novel, code-reasoning–oriented autonomous data synthesis and closed-loop enhancement mechanism, enabling an end-to-end open-source pipeline—from instruction parsing and expert-level reasoning data construction to iterative model capability refinement. Contribution/Results: Experiments demonstrate that COT-Coder-7B-StepDPO achieves a pass@1 score of 21.88 on LiveCodeBench, outperforming same-scale models; COT-Coder-32B-StepDPO attains 35.08, surpassing GPT-4o. Both variants significantly advance code-level logical reasoning performance.

Technology Category

Application Category

📝 Abstract
We introduce CRPE (Code Reasoning Process Enhancer), an innovative three-stage framework for data synthesis and model training that advances the development of sophisticated code reasoning capabilities in large language models (LLMs). Building upon existing system-1 models, CRPE addresses the fundamental challenge of enhancing LLMs' analytical and logical processing in code generation tasks. Our framework presents a methodologically rigorous yet implementable approach to cultivating advanced code reasoning abilities in language models. Through the implementation of CRPE, we successfully develop an enhanced COT-Coder that demonstrates marked improvements in code generation tasks. Evaluation results on LiveCodeBench (20240701-20240901) demonstrate that our COT-Coder-7B-StepDPO, derived from Qwen2.5-Coder-7B-Base, with a pass@1 accuracy of 21.88, exceeds all models with similar or even larger sizes. Furthermore, our COT-Coder-32B-StepDPO, based on Qwen2.5-Coder-32B-Base, exhibits superior performance with a pass@1 accuracy of 35.08, outperforming GPT4O on the benchmark. Overall, CRPE represents a comprehensive, open-source method that encompasses the complete pipeline from instruction data acquisition through expert code reasoning data synthesis, culminating in an autonomous reasoning enhancement mechanism.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' analytical and logical processing in code generation
Developing advanced code reasoning abilities in language models
Improving pass@1 accuracy in code generation benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Three-stage framework for code reasoning enhancement
Advanced COT-Coder with improved code generation
Open-source pipeline from data to reasoning mechanism
🔎 Similar Papers
No similar papers found.
N
Ningxin Gui
School of Mathematics and Statistics, Wuhan University
Qianghuai Jia
Qianghuai Jia
Alibaba Group
Natural Language Processing、Machine Learning、Deep Learning、Knowledge graph、Information Retrieve、Recommendation
F
Feijun Jiang
Alibaba International Digital Commerce
Y
Yuling Jiao
School of Mathematics and Statistics, Wuhan University
D
dechun wang
Alibaba International Digital Commerce
J
Jerry Zhijian Yang
School of Mathematics and Statistics, Wuhan University