HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents

📅 2026-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limitations of existing tool-augmented large language models, which rely on atomic tool invocations that lead to mismatched execution granularity, exposure of intermediate states, and inefficient context usage. To overcome these issues, the authors propose HyperTool, introducing a programmable tool-block mechanism that encapsulates multi-step deterministic subroutines into a single, unified MCP-style call visible to the model, thereby preventing leakage of low-level data flow. Built upon the MCP interface, the framework supports native tool invocation through synthetic trajectory training and real-environment validation. Evaluated on MCP-Universe, HyperTool boosts the accuracy of Qwen3-32B and Qwen3-8B from 15.69% and 9.93% to 35.29% and 33.33%, respectively, substantially outperforming GPT-OSS and Kimi-k2.5.

📝 Abstract

Tool-augmented LLM agents commonly rely on step-wise atomic tool calls, where each invocation, observation, and value transfer is exposed in the main reasoning trace. This creates an \emph{execution-granularity mismatch}: locally deterministic tool workflows are unfolded into repeated model-visible decisions, consuming context and forcing the model to manage low-level dataflow in the trace. We introduce \textbf{HyperTool}, a unified executable MCP-style tool interface that changes the model-visible unit of tool execution. A model invokes HyperTool with a code block that can call existing tools through their original schemas, manipulate returned values, and pass intermediate results locally, folding deterministic tool subroutines into a single outer call. To train models to use this interface, we synthesize HyperTool-format trajectories from cross-tool compositional tasks and verify them in real MCP environments. On MCP-Universe, HyperTool improves average accuracy from 15.69\% to 35.29\% on Qwen3-32B and from 9.93\% to 33.33\% on Qwen3-8B, and surpass GPT-OSS and Kimi-k2.5 on average accuracy, showing that our HyperTool can substantially improve multi-step tool use.

Problem

Research questions and friction points this paper is trying to address.

tool-augmented agents

execution-granularity mismatch

atomic tool calls

multi-step tool use

deterministic tool workflows

Innovation

Methods, ideas, or system contributions that make the work stand out.

HyperTool

tool-augmented agents

execution-granularity mismatch