A State-Transition Framework for Efficient LLM Reasoning

📅 2026-02-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work proposes an efficient reasoning framework based on state transitions to address the high computational and memory costs of chain-of-thought (CoT) reasoning in large language models, which hinder practical deployment and conflict with test-time scaling. The approach models reasoning as a state evolution process and incorporates a linear attention mechanism to reduce attention complexity, enabling each step to leverage prior reasoning states without explicitly accessing all historical tokens. Additionally, a state-based reasoning strategy is introduced to mitigate overthinking caused by noisy intermediate steps. Experimental results across multiple datasets and model scales demonstrate that the method significantly improves both reasoning efficiency and performance.

Technology Category

Application Category

📝 Abstract

While Long Chain-of-Thought (CoT) reasoning significantly improves Large Language Models (LLMs) performance on complex reasoning tasks, the substantial computational and memory costs of generating long CoT sequences limit their efficiency and practicality. Existing studies usually enhance the reasoning efficiency of LLMs by compressing CoT sequences. However, this approach conflicts with test-time scaling, limiting the reasoning capacity of LLMs. In this paper, we propose an efficient reasoning framework that models the reasoning process of LLMs as a state-transition process. Specifically, we first apply a linear attention mechanism to estimate the LLM's reasoning state, which records the historical reasoning information from previous reasoning steps. Then, based on the query prompt and the reasoning state, the LLM can efficiently perform the current reasoning step and update the state. With the linear attention, each token in the current reasoning step can directly retrieve relevant historical reasoning information from the reasoning state, without explicitly attending to tokens in previous reasoning steps. In this way, the computational complexity of attention is reduced from quadratic to linear, significantly improving the reasoning efficiency of LLMs. In addition, we propose a state-based reasoning strategy to mitigate the over-thinking issue caused by noisy reasoning steps. Extensive experiments across multiple datasets and model sizes demonstrate that our framework not only improves the reasoning efficiency of LLMs but also enhances their reasoning performance.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Chain-of-Thought Reasoning

Reasoning Efficiency

Computational Cost

Test-time Scaling

Innovation

Methods, ideas, or system contributions that make the work stand out.

state-transition

linear attention

efficient reasoning

large language models

reasoning state

🔎 Similar Papers

No similar papers found.

Authors to Follow