Effectiveness of Chain-of-Thought in Distilling Reasoning Capability from Large Language Models

📅 2025-11-07

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

Small language models (SLMs) often underperform on complex reasoning tasks due to limited capacity for multi-step inference. Method: This work investigates the role of chain-of-thought (CoT) reasoning in white-box knowledge distillation (KD), proposing to distill high-quality, stepwise reasoning trajectories—generated by large models (e.g., Qwen, Llama2)—into SLMs, thereby teaching them structured inference rather than merely mimicking final answers. Contribution/Results: Experiments demonstrate substantial performance gains on challenging natural language reasoning benchmarks, notably BIG-Bench-Hard (BBH), with significant average accuracy improvements. Crucially, this study provides the first systematic empirical validation that CoT constitutes a *distillable reasoning structure*, outperforming conventional logits-based KD in white-box settings. It establishes CoT-guided distillation as a novel paradigm for endowing lightweight models with robust, interpretable reasoning capabilities—bridging the gap between model efficiency and inferential competence.

Technology Category

Application Category

📝 Abstract

Chain-of-Thought (CoT) prompting is a widely used method to improve the reasoning capability of Large Language Models (LLMs). More recently, CoT has been leveraged in Knowledge Distillation (KD) to transfer reasoning capability from a larger LLM to a smaller one. This paper examines the role of CoT in distilling the reasoning capability from larger LLMs to smaller LLMs using white-box KD, analysing its effectiveness in improving the performance of the distilled models for various natural language reasoning and understanding tasks. We conduct white-box KD experiments using LLMs from the Qwen and Llama2 families, employing CoT data from the CoT-Collection dataset. The distilled models are then evaluated on natural language reasoning and understanding tasks from the BIG-Bench-Hard (BBH) benchmark, which presents complex challenges for smaller LLMs. Experimental results demonstrate the role of CoT in improving white-box KD effectiveness, enabling the distilled models to achieve better average performance in natural language reasoning and understanding tasks from BBH.

Problem

Research questions and friction points this paper is trying to address.

Examines CoT's role in distilling reasoning from large to small LLMs

Analyzes CoT effectiveness in white-box knowledge distillation for reasoning tasks

Evaluates distilled models on complex natural language understanding benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought prompting for reasoning distillation

White-box Knowledge Distillation between LLM families

CoT data improves performance on complex benchmarks

🔎 Similar Papers

No similar papers found.