KG-Reasoner: A Reinforced Model for End-to-End Multi-Hop Knowledge Graph Reasoning

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the limitations of large language models in complex multi-hop knowledge graph reasoning—namely, insufficient flexibility, fragmented reasoning processes, and loss of intermediate information—by introducing the first end-to-end framework that internalizes multi-step graph reasoning as a unified “thinking” process within the model. Trained via reinforcement learning, the model dynamically explores and backtracks along reasoning paths, overcoming the constraints of conventional pipeline-based approaches. By seamlessly integrating large language models, knowledge graphs, and reinforcement learning, the proposed method achieves state-of-the-art or competitive performance across eight knowledge-intensive multi-hop reasoning benchmarks, substantially enhancing both the coherence and accuracy of the reasoning process.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) exhibit strong abilities in natural language understanding and generation, yet they struggle with knowledge-intensive reasoning. Structured Knowledge Graphs (KGs) provide an effective form of external knowledge representation and have been widely used to enhance performance in classical Knowledge Base Question Answering (KBQA) tasks. However, performing precise multi-hop reasoning over KGs for complex queries remains highly challenging. Most existing approaches decompose the reasoning process into a sequence of isolated steps executed through a fixed pipeline. While effective to some extent, such designs constrain reasoning flexibility and fragment the overall decision process, often leading to incoherence and the loss of critical intermediate information from earlier steps. In this paper, we introduce KG-Reasoner, an end-to-end framework that integrates multi-step reasoning into a unified "thinking" phase of a Reasoning LLM. Through Reinforcement Learning (RL), the LLM is trained to internalize the KG traversal process, enabling it to dynamically explore reasoning paths, and perform backtracking when necessary. Experiments on eight multi-hop and knowledge-intensive reasoning benchmarks demonstrate that KG-Reasoner achieves competitive or superior performance compared to the state-of-the-art methods. Codes are available at the repository: https://github.com/Wangshuaiia/KG-Reasoner.

Problem

Research questions and friction points this paper is trying to address.

multi-hop reasoning

Knowledge Graphs

Knowledge Base Question Answering

reasoning flexibility

intermediate information loss

Innovation

Methods, ideas, or system contributions that make the work stand out.

end-to-end reasoning

reinforcement learning

knowledge graph