SPOT: Span-level Pause-of-Thought for Efficient and Interpretable Latent Reasoning in Large Language Models

📅 2026-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost of explicit chain-of-thought reasoning and the limitations of latent reasoning approaches, which often suffer from rigid semantic alignment and insufficient interpretability. The authors propose an efficient and interpretable latent reasoning framework that compresses reasoning steps into explainable pause tokens within the hidden space through span-level semantic alignment and decoding constraints imposed by a frozen language model head. A novel cross-span alignment mechanism based on Sinkhorn optimal transport is introduced to overcome the limitations of conventional point-to-point alignment, ensuring that latent states can be directly decoded into meaningful keywords. Experimental results demonstrate that the method achieves an average accuracy gain of 2.3% across multiple reasoning benchmarks while reducing generated tokens by 37.5%, and provides faithful, human-interpretable semantic explanations.

Technology Category

Application Category

📝 Abstract
Explicit Chain-of-Thought improves the reasoning performance of large language models but often incurs high inference cost due to verbose token-level traces. While recent approaches reduce this overhead via concise prompting or step pruning, they largely truncate what the model says rather than internalize what the model thinks. Latent reasoning offers a promising alternative by performing computation in the hidden space, yet prior methods face two critical challenges. Many existing approaches rely on rigid point-to-point alignment, forcing a latent token to approximate the final representation of a reasoning step, which can be insufficient to capture the dense, variable-length semantics of an entire reasoning segment. Furthermore, these methods often suffer from a lack of interpretability: latent states are commonly produced by unconstrained optimization or embedding mixing, yielding vectors that are difficult to decode or audit under the pretrained language head. We propose SPOT, a flexible framework that compresses explicit CoT into compact latent pause tokens without enforcing a fixed response template. At the core of SPOT is Span-level Semantic Alignment, a Sinkhorn optimal-transport objective that softly matches each pause token to the semantics of an entire reasoning segment, overcoming the rigidity of step-end alignment. To further improve interpretability, SPOT introduces a Frozen-Head Decoding Constraint that keeps latent states directly decodable as token distributions under the frozen pretrained LM head, enabling readable keyword interpretations of latent thoughts. Experiments on reasoning benchmarks demonstrate that SPOT improves accuracy by 2.3 points on average while reducing generated tokens by 37.5% and provides faithful semantic interpretations of the latent reasoning process.
Problem

Research questions and friction points this paper is trying to address.

Latent Reasoning
Chain-of-Thought
Interpretability
Semantic Alignment
Inference Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Reasoning
Span-level Alignment
Optimal Transport
Interpretability
Efficient Inference
🔎 Similar Papers
No similar papers found.
Y
Yunlong Chu
School of New Media and Communication, Tianjin University, Tianjin, China
Minglai Shao
Minglai Shao
Tianjin University
Graph MiningDeep LearningMachine Learning
Yuhang Liu
Yuhang Liu
The University of Adelaide
Representation LearningLLMsLatent Variable ModelsResponsible AI
B
Bing Hao
School of New Media and Communication, Tianjin University, Tianjin, China
Y
Yumeng Lin
School of New Media and Communication, Tianjin University, Tianjin, China
J
Jialu Wang
Independent Contributor, CA, USA
R
Ruijie Wang
School of Computer Science and Engineering, Beihang University, Beijing, China