🤖 AI Summary
To address the limited cross-domain generalization, high operational costs, and low success rates of large language models (LLMs) on complex tasks—particularly on benchmarks like GAIA—this paper proposes Knowledge Graph-oriented Thinking (KGoT), a novel architectural framework. KGoT introduces a dynamic, knowledge-graph-driven reasoning paradigm that tightly integrates LLM inference with real-time construction and iterative refinement of structured knowledge graphs. It leverages multi-tool orchestration—including web crawling, Python execution, and mathematical solvers—to automatically extract, validate, and enhance task-specific knowledge. This design significantly improves reasoning interpretability, cumulative knowledge retention, and model efficiency. Experiments demonstrate substantial gains: on the GAIA benchmark, KGoT achieves a 29% higher task success rate than GPT-4o-mini and reduces computational cost by 36× compared to GPT-4o. Furthermore, it boosts success rates by 36% on Qwen2.5-32B and 37.5% on DeepSeek-R1-70B, underscoring its effectiveness across diverse model scales.
📝 Abstract
Large Language Models (LLMs) are revolutionizing the development of AI assistants capable of performing diverse tasks across domains. However, current state-of-the-art LLM-driven agents face significant challenges, including high operational costs and limited success rates on complex benchmarks like GAIA. To address these issues, we propose the Knowledge Graph of Thoughts (KGoT), an innovative AI assistant architecture that integrates LLM reasoning with dynamically constructed knowledge graphs (KGs). KGoT extracts and structures task-relevant knowledge into a dynamic KG representation, iteratively enhanced through external tools such as math solvers, web crawlers, and Python scripts. Such structured representation of task-relevant knowledge enables low-cost models to solve complex tasks effectively. For example, KGoT achieves a 29% improvement in task success rates on the GAIA benchmark compared to Hugging Face Agents with GPT-4o mini, while reducing costs by over 36x compared to GPT-4o. Improvements for recent reasoning models are similar, e.g., 36% and 37.5% for Qwen2.5-32B and Deepseek-R1-70B, respectively. KGoT offers a scalable, affordable, and high-performing solution for AI assistants.