Turbo Connection: Reasoning as Information Flow from Higher to Lower Layers

πŸ“… 2026-02-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitation of large language models in performing multi-step reasoning due to their fixed computational depth. To overcome this, the authors propose the Turbo Connection mechanism, which introduces dense backward residual connections from higher-layer tokens to their adjacent lower-layer counterparts within the Transformer architecture. This design substantially extends the implicit reasoning path length, surpassing the constraints of conventional sparse state propagation. The method is readily integrable into pre-trained models and amenable to fine-tuning without significant architectural modifications. Empirical evaluations on GSM8K, Parity, and multi-step arithmetic benchmarks demonstrate consistent accuracy improvements ranging from 0.9% to over 10%. Notably, Qwen-3-1.7B achieves a perfect 100% accuracy on the Parity taskβ€”up from 53.78%β€”with negligible increase in generation latency.

Technology Category

Application Category

πŸ“ Abstract
Complex problems, whether in math, logic, or planning, are solved by humans through a sequence of steps where the result of one step informs the next. In this work, we adopt the perspective that the reasoning power of Transformers is fundamentally limited by a fixed maximum number of steps along any latent path of computation. To address this, we introduce Turbo Connection (TurboConn), a novel architecture that overcomes the fixed-depth constraint by routing multiple residual connections from the higher-layer hidden states of each token $t$ to the lower layers of token $t+1$. Fine-tuning pre-trained LLMs with our method not only yields accuracy gains of 0.9% to over 10% on benchmarks like GSM8K, Parity, and multi-step arithmetic, but also demonstrates that the density of these backward connections is critical; our dense interaction significantly outperforms"sparse"alternatives that only pass a single hidden state or vector. Notably, TurboConn can be integrated into pre-trained LLMs to overcome task-specific plateaus: while a fine-tuned Qwen-3-1.7B achieves only 53.78% on Parity, adding our architectural modification enables the model to reach 100% accuracy, all without the necessity to retrain the full model from scratch or sophisticated curriculum learning. Our results provide strong empirical evidence that the depth of the computational path is a key factor in reasoning ability, also offering a new mechanism to enhance LLMs without significantly affecting generation latency.
Problem

Research questions and friction points this paper is trying to address.

reasoning
computational depth
multi-step inference
Transformer limitations
latent computation path
Innovation

Methods, ideas, or system contributions that make the work stand out.

Turbo Connection
reasoning depth
residual connections
multi-step reasoning
LLM fine-tuning
πŸ”Ž Similar Papers
No similar papers found.