🤖 AI Summary
Small reasoning models (SRMs) commonly suffer from “overthinking” in tool calling: even after generating a correct tool-parameter configuration, they continue reasoning and produce erroneous final outputs. This paper proposes ThinkBrake—a training-free decoding strategy that systematically introduces dynamic early termination into tool reasoning. ThinkBrake adaptively truncates redundant reasoning steps by jointly monitoring sentence boundaries and the log-probability gap between the top two tokens following the `</think>` delimiter, triggering termination at critical points in the reasoning chain. Leveraging oracle-based replay analysis, we precisely localize overthinking instances. Evaluation on the BFCL benchmark shows that ThinkBrake maintains or improves accuracy while reducing generated tokens by up to 25%. Oracle analysis further reveals up to 8.4% latent performance gain, significantly enhancing both inference efficiency and controllability.
📝 Abstract
Small reasoning models (SRMs) often overthink during tool use: they reach a correct tool-argument configuration, then continue reasoning and overwrite it with an incorrect final call. We diagnose overthinking via oracle rollouts that inject </think> at sentence boundaries. On the Berkeley Function Calling Leaderboard (BFCL), this oracle termination lifts average accuracy from 85.8% to 94.2% while reducing tokens by 80-94%, revealing substantial recoverable headroom and potential redundant reasoning. While prior work on concise reasoning has largely targeted mathematics, tool reasoning remains underexplored. We adapt various early-termination baselines to tool use and introduce ThinkBrake, a training-free decoding heuristic. ThinkBrake monitors the log-probability margin between </think> and the current top token at sentence boundaries and triggers termination when this margin becomes small. Across BFCL's single turn, non-live and live splits, ThinkBrake preserves or improves accuracy while reducing tokens up to 25%, outperforming various baselines.