R-Capsule: Compressing High-Level Plans for Efficient Large Language Model Reasoning

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Chain-of-thought (CoT) reasoning improves performance on complex tasks but suffers from high latency, substantial memory overhead, and error propagation due to lengthy explicit reasoning chains. To address this, we propose R-Capsule, the first framework to apply the information bottleneck principle to reasoning compression: it compresses explicit CoT traces into a small set of latent “reasoning capsules” via a low-capacity bottleneck network, preserving high-level planning structure. We introduce a dual-objective training scheme—jointly optimizing for main-task accuracy and plan reconstruction loss—to explicitly enforce interpretability and structural fidelity in the latent space, thereby mitigating shortcut learning. Experiments across multiple complex reasoning benchmarks show that R-Capsule matches or exceeds CoT accuracy while reducing visible token count by 62% on average, significantly improving inference speed and memory efficiency. The method thus achieves a favorable trade-off among accuracy, efficiency, and transparency.

Technology Category

Application Category

📝 Abstract
Chain-of-Thought (CoT) prompting helps Large Language Models (LLMs) tackle complex reasoning by eliciting explicit step-by-step rationales. However, CoT's verbosity increases latency and memory usage and may propagate early errors across long chains. We propose the Reasoning Capsule (R-Capsule), a framework that aims to combine the efficiency of latent reasoning with the transparency of explicit CoT. The core idea is to compress the high-level plan into a small set of learned latent tokens (a Reasoning Capsule) while keeping execution steps lightweight or explicit. This hybrid approach is inspired by the Information Bottleneck (IB) principle, where we encourage the capsule to be approximately minimal yet sufficient for the task. Minimality is encouraged via a low-capacity bottleneck, which helps improve efficiency. Sufficiency is encouraged via a dual objective: a primary task loss for answer accuracy and an auxiliary plan-reconstruction loss that encourages the capsule to faithfully represent the original textual plan. The reconstruction objective helps ground the latent space, thereby improving interpretability and reducing the use of uninformative shortcuts. Our framework strikes a balance between efficiency, accuracy, and interpretability, thereby reducing the visible token footprint of reasoning while maintaining or improving accuracy on complex benchmarks. Our codes are available at: https://anonymous.4open.science/r/Reasoning-Capsule-7BE0
Problem

Research questions and friction points this paper is trying to address.

Compressing verbose reasoning chains to reduce latency
Maintaining accuracy while minimizing token footprint
Balancing efficiency with interpretability in LLM reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compresses high-level plans into learned latent tokens
Uses Information Bottleneck principle for minimal sufficient representation
Combines efficiency of latent reasoning with CoT transparency
🔎 Similar Papers
No similar papers found.
H
Hongyu Shan
Tianjin University, Tianjin, China
M
Mingyang Song
Tencent Hunyuan Team, Shenzhen, China
C
Chang Dai
Peking University, Beijing, China
Di Liang
Di Liang
University of Michigan
diode lasersSi photonicsphotonic integrated circuitsnanofabrication
H
Han Chen
National Engineering Research Center of Educational Big Data and the Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan, China