🤖 AI Summary
This work addresses the high computational cost and output-length sensitivity of large language models that rely on explicit chain-of-thought reasoning. It introduces a novel approach that formulates latent reasoning as a geometric path approximation problem in the pretrained word embedding space. By employing a lightweight transition head to predict iterative update directions of continuous latent states, the method approximates discrete reasoning processes through compressed continuous trajectories, using textual chains of thought as anchor points. Notably, this framework eliminates the need for explicit optimization of reasoning length. Evaluated on the Qwen3 model and mathematical reasoning benchmarks, it substantially reduces generation steps while maintaining or even improving accuracy, thereby uncovering a new trade-off among latent computation, output length, and performance.
📝 Abstract
Large language models solve complex problems by generating lengthy chains of explicit reasoning tokens. While effective, this makes reasoning expensive, length-sensitive, and constrained to (discrete) natural language. While latent reasoning offers a continuous alternative, determining useful structures for intermediate latent states is an open challenge. In this paper, we formulate latent reasoning as a geometric path-approximation problem within the model's pretrained token-embedding space. We introduce Geometric Latent Reasoning (GLR), which uses a lightweight transition head to predict iterative direction updates in embedding space. Using textual chain-of-thought traces as anchors, GLR learns to approximate discrete reasoning trajectories while permitting continuous deviations from exact token embeddings. Evaluations on mathematical reasoning benchmarks using Qwen3 models reveal an emergent phenomenon: geometric latent reasoning induces substantially shorter generations without an explicit length objective. By replacing early explicit reasoning with continuous latent steps, models often reach correct answers using substantially fewer total generation steps. These findings suggest that continuous trajectories act as compact intermediate reasoning states, exposing a new tradeoff between latent computation budget, output length, and accuracy.