🤖 AI Summary
Existing large language models (LLMs) suffer from weak planning capabilities, poor fault tolerance, and limited knowledge reuse in complex tool-augmented reasoning, heavily relying on manual prompting and static inference. To address these limitations, we propose *code-skeleton-driven tool learning*: a novel paradigm that frames tool invocation as the generation of structured, annotated Python function skeletons—enabling multi-step reasoning, dynamic execution, and result caching. We introduce the first traceback-based automatic error localization and adaptive retry mechanism, coupled with a reusable execution result repository. This establishes a closed-loop workflow—*generate → execute → diagnose → optimize*—that enhances both task completion rates and execution robustness across diverse tool-integration benchmarks. Our approach demonstrates the effectiveness and generalizability of code-centric modeling for advancing LLMs’ tool utilization capabilities.
📝 Abstract
Tool learning has emerged as a crucial capability for large language models (LLMs) to solve complex real-world tasks through interaction with external tools. Existing approaches face significant challenges, including reliance on hand-crafted prompts, difficulty in multi-step planning, and lack of precise error diagnosis and reflection mechanisms. We propose ToolCoder, a novel framework that reformulates tool learning as a code generation task. Inspired by software engineering principles, ToolCoder transforms natural language queries into structured Python function scaffold and systematically breaks down tasks with descriptive comments, enabling LLMs to leverage coding paradigms for complex reasoning and planning. It then generates and executes function implementations to obtain final responses. Additionally, ToolCoder stores successfully executed functions in a repository to promote code reuse, while leveraging error traceback mechanisms for systematic debugging, optimizing both execution efficiency and robustness. Experiments demonstrate that ToolCoder achieves superior performance in task completion accuracy and execution reliability compared to existing approaches, establishing the effectiveness of code-centric approaches in tool learning.