Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study identifies a critical limitation: chain-of-thought (CoT) distillation exhibits severe performance degradation when transferred across heterogeneous models, challenging the universality assumption of existing distillation methods. To address this, we propose a structured parsing and optimization framework. First, we formally characterize the cross-model CoT distillation failure phenomenon. Second, we design a three-stage data augmentation paradigm—splitting, simplifying, and correcting—to enable interpretable reconstruction of reasoning paths. Third, we introduce a structured distillation technique incorporating data segmentation decoupling, redundancy-free filtering, and explicit modeling of intermediate erroneous states, tailored to teacher model outputs (e.g., Qwen-QwQ). Experiments on mathematical and complex reasoning benchmarks demonstrate that our method simultaneously improves student model accuracy and token efficiency, outperforming baseline distillation approaches by 12.7%–18.3%.

Technology Category

Application Category

📝 Abstract
Recent advancements in large language models (LLMs) have demonstrated remarkable reasoning capabilities through long chain-of-thought (CoT) reasoning. The R1 distillation scheme has emerged as a promising approach for training cost-effective models with enhanced reasoning abilities. However, the underlying mechanisms driving its effectiveness remain unclear. This study examines the universality of distillation data and identifies key components that enable the efficient transfer of long-chain reasoning capabilities in LLM distillation. Our findings reveal that the effectiveness of long CoT reasoning distillation from teacher models like Qwen-QwQ degrades significantly on nonhomologous models, challenging the assumed universality of current distillation methods. To gain deeper insights into the structure and patterns of long CoT reasoning, we propose DLCoT (Deconstructing Long Chain-of-Thought), a distillation data enhancement framework. DLCoT consists of three key steps: (1) data segmentation to decompose complex long CoT structures, (2) simplification by eliminating unsolvable and redundant solutions, and (3) optimization of intermediate error states. Our approach significantly improves model performance and token efficiency, facilitating the development of high-performance LLMs.
Problem

Research questions and friction points this paper is trying to address.

Unclear mechanisms in R1 distillation for long CoT reasoning.
Effectiveness degradation of long CoT distillation on nonhomologous models.
Need for structured optimization in long CoT reasoning distillation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data segmentation for complex CoT structures
Simplification by removing redundant solutions
Optimization of intermediate error states
🔎 Similar Papers
No similar papers found.
Y
Yijia Luo
Alibaba Group
Y
Yulin Song
Alibaba Group, New York University
Xingyao Zhang
Xingyao Zhang
Microsoft
J
Jiaheng Liu
Alibaba Group
W
Weixun Wang
Alibaba Group
G
GengRu Chen
Alibaba Group
W
Wenbo Su
Alibaba Group
B
Bo Zheng
Alibaba Group