OneLatent: Single-Token Compression for Visual Latent Reasoning

📅 2026-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes an efficient compressed reasoning paradigm that addresses the substantial increase in output length and computational overhead associated with Chain-of-Thought (CoT) prompting. By compressing intermediate steps of complex visual reasoning into a single latent token, the method leverages supervision signals derived from text-to-image rendering and alignment with DeepSeek-OCR hidden states. Evaluated on ProntoQA and ProsQA, the approach achieves accuracies of 99.80% and 97.80%, respectively, while reducing average output length by 11× (up to 87.4×) with only a 2.21% accuracy drop. Furthermore, it yields a 6.8× improvement in reasoning throughput (OTC), enabling highly efficient, auditable, and low-redundancy inference.

Technology Category

Application Category

📝 Abstract
Chain-of-thought (CoT) prompting improves reasoning but often increases inference cost by one to two orders of magnitude. To address these challenges, we present \textbf{OneLatent}, a framework that compresses intermediate reasoning into a single latent token via supervision from rendered CoT images and DeepSeek-OCR hidden states. By rendering textual steps into images, we obtain a deterministic supervision signal that can be inspected and audited without requiring the model to output verbose textual rationales. Across benchmarks, OneLatent reduces average output length by $11\times$ with only a $2.21\%$ average accuracy drop relative to textual CoT, while improving output token contribution (OTC) by $6.8\times$. On long-chain logical reasoning, OneLatent reaches $99.80\%$ on ProntoQA and $97.80\%$ on ProsQA with one latent token, with compression up to $87.4\times$, supporting compression-constrained generalization.
Problem

Research questions and friction points this paper is trying to address.

Chain-of-thought
reasoning compression
inference cost
latent token
multimodal reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

OneLatent
latent token compression
chain-of-thought reasoning
visual supervision
output token efficiency
🔎 Similar Papers
No similar papers found.
B
Bo Lv
Tsinghua University
Y
Yasheng Sun
Institute of Science Tokyo
J
Junjie Wang
Tsinghua University
Haoxiang Shi
Haoxiang Shi
Waseda University
Nature Language ProcessingDense Retrieve