Rethinking the Generation of High-Quality CoT Data from the Perspective of LLM-Adaptive Question Difficulty Grading

📅 2025-04-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) often exhibit a mismatch between their reasoning capabilities and the difficulty level of chain-of-thought (CoT) training data, hindering efficient supervised fine-tuning (SFT). Method: This paper proposes an LLM-adaptive problem difficulty grading framework that dynamically aligns CoT data generation with the target model’s capacity. It generates initial CoT samples using DeepSeek-R1 (671B), then applies model capability-aware difficulty estimation, hierarchical sampling, and task-specific data construction. Contribution/Results: To our knowledge, this is the first capability-driven CoT data generation paradigm. It substantially reduces data curation costs and improves SFT efficiency. Using only 2K CoT samples each for mathematics and coding, ZMath-32B and ZCode-32B surpass DeepSeek-Distill-32B on mathematical olympiad and code generation benchmarks, respectively—demonstrating both effectiveness and strong generalization across domains.

Technology Category

Application Category

📝 Abstract
Recently, DeepSeek-R1 (671B) (DeepSeek-AIet al., 2025) has demonstrated its excellent reasoning ability in complex tasks and has publiclyshared its methodology. This provides potentially high-quality chain-of-thought (CoT) data for stimulating the reasoning abilities of small-sized large language models (LLMs). To generate high-quality CoT data for different LLMs, we seek an efficient method for generating high-quality CoT data with LLM-Adaptive questiondifficulty levels. First, we grade the difficulty of the questions according to the reasoning ability of the LLMs themselves and construct a LLM-Adaptive question database. Second, we sample the problem database based on a distribution of difficulty levels of the questions and then use DeepSeek-R1 (671B) (DeepSeek-AI et al., 2025) to generate the corresponding high-quality CoT data with correct answers. Thanks to the construction of CoT data with LLM-Adaptive difficulty levels, we have significantly reduced the cost of data generation and enhanced the efficiency of model supervised fine-tuning (SFT). Finally, we have validated the effectiveness and generalizability of the proposed method in the fields of complex mathematical competitions and code generation tasks. Notably, with only 2k high-quality mathematical CoT data, our ZMath-32B surpasses DeepSeek-Distill-32B in math reasoning task. Similarly, with only 2k high-quality code CoT data, our ZCode-32B surpasses DeepSeek-Distill-32B in code reasoning tasks.
Problem

Research questions and friction points this paper is trying to address.

Generating high-quality CoT data for small LLMs
Adapting question difficulty to LLM reasoning ability
Reducing data generation cost for model fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-Adaptive question difficulty grading
High-quality CoT data generation
Efficient supervised fine-tuning enhancement
🔎 Similar Papers
No similar papers found.
Q
Qianjin Yu
Intelligent System Department, Zhongxing Telecom Equipment(ZTE), Changsha, Hunan, China
Keyu Wu
Keyu Wu
Institute for Infocomm Research, A*STAR, Singapore
deep learningreinforcement learningtransfer learningautonomous navigation
Z
Zihan Chen
Intelligent System Department, Zhongxing Telecom Equipment(ZTE), Changsha, Hunan, China
C
Chushu Zhang
Intelligent System Department, Zhongxing Telecom Equipment(ZTE), Changsha, Hunan, China
M
Manlin Mei
Intelligent System Department, Zhongxing Telecom Equipment(ZTE), Changsha, Hunan, China
L
Lingjun Huang
Intelligent System Department, Zhongxing Telecom Equipment(ZTE), Changsha, Hunan, China
F
Fang Tan
Intelligent System Department, Zhongxing Telecom Equipment(ZTE), Changsha, Hunan, China
Y
Yongsheng Du
Intelligent System Department, Zhongxing Telecom Equipment(ZTE), Changsha, Hunan, China
K
Kunlin Liu
Intelligent System Department, Zhongxing Telecom Equipment(ZTE), Changsha, Hunan, China
Yurui Zhu
Yurui Zhu
University of Science and Technology of China