DART: Distilling Autoregressive Reasoning to Silent Thought

📅 2025-06-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high latency of chain-of-thought (CoT) reasoning in large language models (LLMs) caused by autoregressive generation, this paper proposes the “Silent Thinking” (ST) paradigm: distilling explicit CoT into non-autoregressive, implicit reasoning states. Methodologically, we design a dual-path collaborative training framework and introduce a lightweight Reasoning Evolution Module (REM). Through self-supervised distillation and latent-state alignment, a small number of ST tokens dynamically evolve into rich, inference-laden implicit representations. Experiments demonstrate that ST achieves accuracy comparable to state-of-the-art CoT methods while significantly reducing inference latency and computational overhead—enabling deployment in low-latency scenarios. The core contribution is the first end-to-end distillation framework that transforms CoT into non-autoregressive, implicit reasoning, establishing a novel paradigm for efficient LLM reasoning.

Technology Category

Application Category

📝 Abstract
Chain-of-Thought (CoT) reasoning has significantly advanced Large Language Models (LLMs) in solving complex tasks. However, its autoregressive paradigm leads to significant computational overhead, hindering its deployment in latency-sensitive applications. To address this, we propose extbf{DART} ( extbf{D}istilling extbf{A}utoregressive extbf{R}easoning to Silent extbf{T}hought), a self-distillation framework that enables LLMs to replace autoregressive CoT with non-autoregressive Silent Thought (ST). Specifically, DART introduces two training pathways: the CoT pathway for traditional reasoning and the ST pathway for generating answers directly from a few ST tokens. The ST pathway utilizes a lightweight Reasoning Evolvement Module (REM) to align its hidden states with the CoT pathway, enabling the ST tokens to evolve into informative embeddings. During inference, only the ST pathway is activated, leveraging evolving ST tokens to deliver the answer directly. Extensive experimental results demonstrate that DART achieves comparable reasoning performance to existing baselines while offering significant efficiency gains, serving as a feasible alternative for efficient reasoning.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational overhead in autoregressive reasoning
Enabling non-autoregressive Silent Thought for efficiency
Maintaining reasoning performance while improving speed
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-distillation framework for non-autoregressive reasoning
Lightweight Reasoning Evolvement Module for alignment
Direct answer generation from Silent Thought tokens
🔎 Similar Papers
No similar papers found.
N
Nan Jiang
Nanjing University, Tencent Inc.
Ziming Wu
Ziming Wu
Hong Kong University of Science and Technology
De-Chuan Zhan
De-Chuan Zhan
Nanjing University, China
Machine LearningData Mining
F
Fuming Lai
Tencent Inc.
S
Shaobing Lian
Tencent Inc.