🤖 AI Summary
This work addresses the challenge that existing latent reasoning methods lack verifiability of intermediate states and struggle to preserve critical constraints from the original query. To overcome this limitation, the authors propose ReLAT, a novel approach that introduces, for the first time, a self-supervised reconstruction mechanism during inference. By constructing a differentiable closed loop—“question → latent thought → question”—the method leverages the original query as a supervisory signal to refine latent state representations before generating the final answer. Integrating test-time training, differentiable reconstruction loss, and large language models from the Qwen series, ReLAT substantially outperforms current methods across mathematical reasoning, knowledge-based question answering, and code generation tasks, achieving a 16.6 percentage point improvement on AIME 2024 accuracy with Qwen3-8B, reaching 73.3%.
📝 Abstract
Recent work moves intermediate reasoning from natural-language traces into latent or cache-level representations to reduce token overhead and avoid a discrete communication bottleneck. However, this shift also removes a key advantage of textual reasoning: intermediate states are no longer inspectable, making it difficult to determine whether a latent state still preserves the constraints of the original query. As a result, latent reasoning typically operates in an open loop, where a latent state is produced and consumed without an input-anchored fidelity check. We propose ReLAT (Reconstruction-Guided Latent Reasoning At Test Time), a self-supervised test-time training method that closes this loop using the query itself as the reference. Our key observation is that if a latent state faithfully represents a query, the query should be recoverable from it; if the query cannot be recovered, the latent state has lost task-relevant information. ReLAT operationalizes this principle by constructing a differentiable Question -> Latent Thought -> Question cycle and optimizing query reconstruction loss through the latent thought before answer generation. This anchors opaque latent computation to the problem specification it is supposed to represent. Across mathematical reasoning, knowledge QA, and code generation benchmarks on the Qwen family, ReLAT consistently improves over single-model inference, text-based collaboration, open-loop latent collaboration, and alternative test-time training objectives. On Qwen3-8B, ReLAT raises AIME 2024 accuracy from 56.7% to 73.3%, a 16.6-point gain over the strongest open-loop latent baseline.