ThinkBrake: Mitigating Overthinking in Tool Reasoning

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

Small reasoning models (SRMs) commonly suffer from “overthinking” in tool calling: even after generating a correct tool-parameter configuration, they continue reasoning and produce erroneous final outputs. This paper proposes ThinkBrake—a training-free decoding strategy that systematically introduces dynamic early termination into tool reasoning. ThinkBrake adaptively truncates redundant reasoning steps by jointly monitoring sentence boundaries and the log-probability gap between the top two tokens following the `</think>` delimiter, triggering termination at critical points in the reasoning chain. Leveraging oracle-based replay analysis, we precisely localize overthinking instances. Evaluation on the BFCL benchmark shows that ThinkBrake maintains or improves accuracy while reducing generated tokens by up to 25%. Oracle analysis further reveals up to 8.4% latent performance gain, significantly enhancing both inference efficiency and controllability.

Technology Category

Application Category

📝 Abstract

Small reasoning models (SRMs) often overthink during tool use: they reach a correct tool-argument configuration, then continue reasoning and overwrite it with an incorrect final call. We diagnose overthinking via oracle rollouts that inject </think> at sentence boundaries. On the Berkeley Function Calling Leaderboard (BFCL), this oracle termination lifts average accuracy from 85.8% to 94.2% while reducing tokens by 80-94%, revealing substantial recoverable headroom and potential redundant reasoning. While prior work on concise reasoning has largely targeted mathematics, tool reasoning remains underexplored. We adapt various early-termination baselines to tool use and introduce ThinkBrake, a training-free decoding heuristic. ThinkBrake monitors the log-probability margin between </think> and the current top token at sentence boundaries and triggers termination when this margin becomes small. Across BFCL's single turn, non-live and live splits, ThinkBrake preserves or improves accuracy while reducing tokens up to 25%, outperforming various baselines.

Problem

Research questions and friction points this paper is trying to address.

Mitigating overthinking in small reasoning models

Reducing redundant reasoning during tool use

Improving accuracy while decreasing token usage

Innovation

Methods, ideas, or system contributions that make the work stand out.

Oracle rollouts diagnose overthinking via sentence termination

ThinkBrake monitors log-probability margin for early termination

Training-free decoding heuristic reduces tokens while preserving accuracy

🔎 Similar Papers

Can Tool-augmented Large Language Models be Aware of Incomplete Conditions?