🤖 AI Summary
Large language model (LLM)-based tool agents struggle to accurately identify and sequentially invoke multiple APIs for complex tasks. Method: We construct the first expert-annotated, parameter-level API graph dataset, explicitly modeling inter-API dependencies and parameter-level semantic associations. Our approach builds a structured graph from real-world API documentation and benchmarks, integrating LLM fine-tuning with graph-aware reasoning to support compositional tool invocation. Contribution/Results: We are the first to demonstrate the critical role of parameter-level API relationship modeling in multi-tool coordination. Our method significantly improves both tool retrieval and multi-API sequence generation, achieving nearly double the performance of vanilla LLM baselines on standard benchmarks. Moreover, the fine-tuned model bridges 90% of the performance gap relative to oracle-based upper bounds.
📝 Abstract
Tool agents -- LLM-based systems that interact with external APIs -- offer a way to execute real-world tasks. However, as tasks become increasingly complex, these agents struggle to identify and call the correct APIs in the proper order. To tackle this problem, we investigate converting API documentation into a structured API graph that captures API dependencies and leveraging it for multi-tool queries that require compositional API calls. To support this, we introduce In-N-Out, the first expert-annotated dataset of API graphs built from two real-world API benchmarks and their documentation. Using In-N-Out significantly improves performance on both tool retrieval and multi-tool query generation, nearly doubling that of LLMs using documentation alone. Moreover, graphs generated by models fine-tuned on In-N-Out close 90% of this gap, showing that our dataset helps models learn to comprehend API documentation and parameter relationships. Our findings highlight the promise of using explicit API graphs for tool agents and the utility of In-N-Out as a valuable resource. We will release the dataset and code publicly.