🤖 AI Summary
Large language models often fail in multi-step tool usage due to insufficient or inadequately activated knowledge. This work systematically investigates the role of knowledge in tool utilization and proposes a comprehensive framework encompassing knowledge acquisition, activation, and internalization. Specifically, it enhances knowledge acquisition through instance-level experiential knowledge modeling, improves knowledge activation via parallel sampling to broaden reasoning pathways, and facilitates knowledge internalization through reinforcement learning. Experimental results demonstrate that the proposed approach significantly outperforms strong baselines on BFCL-V3 and AppWorld benchmarks, with consistent performance gains across varying model scales, thereby validating the effectiveness and generalizability of the knowledge-enhanced strategy.
📝 Abstract
Large language models (LLMs) rely on tool use to act as autonomous agents, yet often fail in multi-step execution due to insufficient tool-related knowledge and ineffective knowledge activation. Therefore, we present a systematic study on how knowledge influences tool-use performance, covering the stages of knowledge acquisition, activation, and internalization. In the knowledge acquisition stage, we acquire and evaluate various forms of experiential knowledge, and our analysis shows that simple instance-level knowledge can already provide strong and reliable gains, while abstract intent-level knowledge offers limited benefits. At inference time, to activate knowledge, we find that prompting LLM to expand the depth of reasoning yields diminishing returns, whereas expanding the width of reasoning by parallel sampling with aggregation more effectively activates latent experiential knowledge. At training time, for knowledge internalization, post-training with knowledge-augmented data further improves performance, with reinforcement learning outperforming supervised fine-tuning. Based on these insights, we propose the Knowledge-Augmented Tool Execution (KATE), a knowledge-augmented tool execution framework that integrates experiential knowledge with reasoning-width-expanded inference and knowledge-aware training. Experiments on BFCL-V3 and AppWorld demonstrate consistent and substantial improvements over strong baselines across model scales. Our Code is available at https://github.com/hypasd-art/KATE.