🤖 AI Summary
To address low tool retrieval accuracy in complex, multi-step user queries, this paper proposes a knowledge graph (KG)-based structured tool retrieval framework. Unlike conventional approaches relying solely on query-description semantic matching, our method constructs a KG encoding tool semantics and functional dependencies, and introduces a one-hop ego-graph integration retrieval algorithm to explicitly model both direct and indirect contextual relationships among tools. Furthermore, we incorporate a semantic-lexical hybrid re-ranking mechanism and structured reasoning over ego-graph ensembles to enhance tool composition recommendation for multi-step tasks. Evaluated on an internal synthetic dataset, our approach achieves a 91.85% micro-averaged exact recall—significantly outperforming non-KG baselines (89.26%). This demonstrates the critical benefit of structured relational modeling for improving tool retrieval performance in AI agent systems.
📝 Abstract
Effective tool retrieval is essential for AI agents to select from a vast array of tools when identifying and planning actions in the context of complex user queries. Despite its central role in planning, this aspect remains underexplored in the literature. Traditional approaches rely primarily on similarities between user queries and tool descriptions, which significantly limits retrieval accuracy, specifically when handling multi-step user requests. To address these limitations, we propose a Knowledge Graph (KG)-based tool retrieval framework that captures the semantic relationships between tools and their functional dependencies. Our retrieval algorithm leverages ensembles of 1-hop ego tool graphs to model direct and indirect connections between tools, enabling more comprehensive and contextual tool selection for multi-step tasks. We evaluate our approach on a synthetically generated internal dataset across six defined user classes, extending previous work on coherent dialogue synthesis and too retrieval benchmarks. Results demonstrate that our tool graph-based method achieves 91.85% tool coverage on the micro-average Complete Recall metric, compared to 89.26% for re-ranked semantic-lexical hybrid retrieval, the strongest non-KG baseline in our experiments. These findings support our hypothesis that the structural information in the KG provides complementary signals to pure similarity matching, particularly for queries requiring sequential tool composition.