🤖 AI Summary
To address the low inference efficiency of chain-of-thought (CoT) reasoning in large language models (LLMs) and the inability of existing latent-space methods to distinguish critical from auxiliary reasoning steps, this paper proposes an adaptive dual-path shortcut inference paradigm. We introduce two novel mechanisms: (1) dynamic-depth shortcut (DS), which applies deep reasoning only to critical tokens while enabling early exit for non-critical ones; and (2) step shortcut (SS), which enables cross-layer reuse of hidden states across decoding steps to support latent-space skip-step reasoning. Our method builds upon a lightweight Transformer adapter and integrates a two-stage self-distillation pipeline—natural-language CoT → continuous latent-space reasoning → adaptive shortcut path—along with hidden-state reuse. On GSM8K, it achieves accuracy comparable to standard CoT fine-tuning while accelerating inference by 20.3× and reducing average token generation by 92.31%.
📝 Abstract
Chain-of-thought (CoT) reasoning enables large language models (LLMs) to move beyond fast System-1 responses and engage in deliberative System-2 reasoning. However, this comes at the cost of significant inefficiency due to verbose intermediate output. Recent latent-space reasoning methods improve efficiency by operating on hidden states without decoding into language, yet they treat all steps uniformly, failing to distinguish critical deductions from auxiliary steps and resulting in suboptimal use of computational resources. In this paper, we propose System-1.5 Reasoning, an adaptive reasoning framework that dynamically allocates computation across reasoning steps through shortcut paths in latent space.Specifically, System-1.5 Reasoning introduces two types of dynamic shortcuts. The model depth shortcut (DS) adaptively reasons along the vertical depth by early exiting non-critical tokens through lightweight adapter branches, while allowing critical tokens to continue through deeper Transformer layers. The step shortcut (SS) reuses hidden states across the decoding steps to skip trivial steps and reason horizontally in latent space. Training System-1.5 Reasoning involves a two-stage self-distillation process: first distilling natural language CoT into latent-space continuous thought, and then distilling full-path System-2 latent reasoning into adaptive shortcut paths (System-1.5 Reasoning).Experiments on reasoning tasks demonstrate the superior performance of our method. For example, on GSM8K, System-1.5 Reasoning achieves reasoning performance comparable to traditional CoT fine-tuning methods while accelerating inference by over 20x and reducing token generation by 92.31% on average.