ThinkBrake: Mitigating Overthinking in Tool Reasoning

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Small reasoning models (SRMs) commonly suffer from “overthinking” in tool calling: even after generating a correct tool-parameter configuration, they continue reasoning and produce erroneous final outputs. This paper proposes ThinkBrake—a training-free decoding strategy that systematically introduces dynamic early termination into tool reasoning. ThinkBrake adaptively truncates redundant reasoning steps by jointly monitoring sentence boundaries and the log-probability gap between the top two tokens following the `</think>` delimiter, triggering termination at critical points in the reasoning chain. Leveraging oracle-based replay analysis, we precisely localize overthinking instances. Evaluation on the BFCL benchmark shows that ThinkBrake maintains or improves accuracy while reducing generated tokens by up to 25%. Oracle analysis further reveals up to 8.4% latent performance gain, significantly enhancing both inference efficiency and controllability.

Technology Category

Application Category

📝 Abstract
Small reasoning models (SRMs) often overthink during tool use: they reach a correct tool-argument configuration, then continue reasoning and overwrite it with an incorrect final call. We diagnose overthinking via oracle rollouts that inject </think> at sentence boundaries. On the Berkeley Function Calling Leaderboard (BFCL), this oracle termination lifts average accuracy from 85.8% to 94.2% while reducing tokens by 80-94%, revealing substantial recoverable headroom and potential redundant reasoning. While prior work on concise reasoning has largely targeted mathematics, tool reasoning remains underexplored. We adapt various early-termination baselines to tool use and introduce ThinkBrake, a training-free decoding heuristic. ThinkBrake monitors the log-probability margin between </think> and the current top token at sentence boundaries and triggers termination when this margin becomes small. Across BFCL's single turn, non-live and live splits, ThinkBrake preserves or improves accuracy while reducing tokens up to 25%, outperforming various baselines.
Problem

Research questions and friction points this paper is trying to address.

Mitigating overthinking in small reasoning models
Reducing redundant reasoning during tool use
Improving accuracy while decreasing token usage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Oracle rollouts diagnose overthinking via sentence termination
ThinkBrake monitors log-probability margin for early termination
Training-free decoding heuristic reduces tokens while preserving accuracy
🔎 Similar Papers
No similar papers found.
M
Minjae Oh
Seoul National University
S
Sangjun Song
Korea University
Seungkyu Lee
Seungkyu Lee
Professor, Kyung Hee University
computer vision
S
Sungmin Jo
Seoul National University
Yohan Jo
Yohan Jo
Seoul National University
Natural Language ProcessingAgentsComputational PsychologyReasoning