ConciseHint: Boosting Efficient Reasoning via Continuous Concise Hints during Generation

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large reasoning models (LRMs) often produce excessively long chains of thought for complex tasks, severely degrading inference efficiency. To address this, we propose a **dynamic conciseness-guidance mechanism operating during inference**: at each token-generation step, adaptive textual prompts—either manually crafted or trained on concise reasoning data—are injected, with prompt strength dynamically modulated based on real-time complexity estimation to constrain reasoning length. Unlike conventional “optimize-then-infer” paradigms, our approach enables online, generation-time conciseness intervention—a first in LRM inference. Evaluated on GSM8K using Qwen-3 4B, our method reduces average reasoning length by 65% while preserving near-original accuracy, yielding substantial gains in both efficiency and practical utility.

Technology Category

Application Category

📝 Abstract
Recent advancements in large reasoning models (LRMs) like DeepSeek-R1 and OpenAI o1 series have achieved notable performance enhancements on complex reasoning tasks by scaling up the generation length by Chain-of-Thought (CoT). However, an emerging issue is their inclination to produce excessively verbose reasoning processes, leading to the inefficiency problem. Existing literature on improving efficiency mainly adheres to the before-reasoning paradigms such as prompting and reasoning or fine-tuning and reasoning, but ignores the promising direction of directly encouraging the model to speak concisely by intervening during the generation of reasoning. In order to fill the blank, we propose a framework dubbed ConciseHint, which continuously encourages the reasoning model to speak concisely by injecting the textual hint (manually designed or trained on the concise data) during the token generation of the reasoning process. Besides, ConciseHint is adaptive to the complexity of the query by adaptively adjusting the hint intensity, which ensures it will not undermine model performance. Experiments on the state-of-the-art LRMs, including DeepSeek-R1 and Qwen-3 series, demonstrate that our method can effectively produce concise reasoning processes while maintaining performance well. For instance, we achieve a reduction ratio of 65% for the reasoning length on GSM8K benchmark with Qwen-3 4B with nearly no accuracy loss.
Problem

Research questions and friction points this paper is trying to address.

Reducing verbose reasoning in large models
Intervening during generation for conciseness
Adapting hint intensity to query complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous concise hints during token generation
Adaptive hint intensity based on query complexity
Textual hints from manual design or concise data
🔎 Similar Papers
No similar papers found.