🤖 AI Summary
Real-world APIs suffer from limited availability, narrow domain coverage, and poor stability—e.g., key dependency and rate limiting—hindering large-scale training and robust evaluation of AI agents. To address this, we propose the first scalable synthetic tool ecosystem framework, comprising three tightly integrated modules: (1) automated tool generation across diverse domains; (2) high-fidelity behavioral simulation achieving 94% accuracy; and (3) formal consistency auditing ensuring interface reliability, with 99% audit accuracy. Our framework doubles both domain breadth and tool density compared to prior work, and introduces challenging downstream tasks that significantly improve evaluation stability and training scalability. This establishes a reliable, reproducible infrastructure for advancing research on AI agent tool-use capabilities.
📝 Abstract
AI agents increasingly rely on external tools to solve complex, long-horizon tasks. Advancing such agents requires reproducible evaluation and large-scale training in controllable, diverse, and realistic tool-use environments. However, real-world APIs are limited in availability, domain coverage, and stability, often requiring access keys and imposing rate limits, which render them impractical for stable evaluation or scalable training. To address these challenges, we introduce SynthTools, a flexible and scalable framework for generating synthetic tool ecosystems. Our framework consists of three core components: Tool Generation for automatic and scalable creation of diverse tools, Tool Simulation to emulate realistic tool behaviors, and Tool Audit to ensure correctness and consistency of tool simulation. To illustrate its scalability, we show that SynthTools can readily produce toolsets that span twice as many domains and twice as many tools per domain as prior work. Furthermore, the tool simulation and tool audit components demonstrate strong reliability, achieving $94%$ and $99%$ accuracy respectively. Finally, we construct downstream tasks from the generated tools that even state-of-the-art models struggle to complete. By enabling scalable, diverse, and reliable tool ecosystems, SynthTools provides a practical path toward large-scale training and stable evaluation of tool-use agents. Our code is available at https://github.com/namkoong-lab/SynthTools.