DART: Distilling Autoregressive Reasoning to Silent Thought

📅 2025-06-13

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

To address the high latency of chain-of-thought (CoT) reasoning in large language models (LLMs) caused by autoregressive generation, this paper proposes the “Silent Thinking” (ST) paradigm: distilling explicit CoT into non-autoregressive, implicit reasoning states. Methodologically, we design a dual-path collaborative training framework and introduce a lightweight Reasoning Evolution Module (REM). Through self-supervised distillation and latent-state alignment, a small number of ST tokens dynamically evolve into rich, inference-laden implicit representations. Experiments demonstrate that ST achieves accuracy comparable to state-of-the-art CoT methods while significantly reducing inference latency and computational overhead—enabling deployment in low-latency scenarios. The core contribution is the first end-to-end distillation framework that transforms CoT into non-autoregressive, implicit reasoning, establishing a novel paradigm for efficient LLM reasoning.

Technology Category

Application Category

📝 Abstract

Chain-of-Thought (CoT) reasoning has significantly advanced Large Language Models (LLMs) in solving complex tasks. However, its autoregressive paradigm leads to significant computational overhead, hindering its deployment in latency-sensitive applications. To address this, we propose extbf{DART} ( extbf{D}istilling extbf{A}utoregressive extbf{R}easoning to Silent extbf{T}hought), a self-distillation framework that enables LLMs to replace autoregressive CoT with non-autoregressive Silent Thought (ST). Specifically, DART introduces two training pathways: the CoT pathway for traditional reasoning and the ST pathway for generating answers directly from a few ST tokens. The ST pathway utilizes a lightweight Reasoning Evolvement Module (REM) to align its hidden states with the CoT pathway, enabling the ST tokens to evolve into informative embeddings. During inference, only the ST pathway is activated, leveraging evolving ST tokens to deliver the answer directly. Extensive experimental results demonstrate that DART achieves comparable reasoning performance to existing baselines while offering significant efficiency gains, serving as a feasible alternative for efficient reasoning.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational overhead in autoregressive reasoning

Enabling non-autoregressive Silent Thought for efficiency

Maintaining reasoning performance while improving speed

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-distillation framework for non-autoregressive reasoning

Lightweight Reasoning Evolvement Module for alignment

Direct answer generation from Silent Thought tokens

🔎 Similar Papers

No similar papers found.