🤖 AI Summary
Existing RAG methods rely on semantic retrieval over isolated text chunks, failing to model inter-chunk factual relationships—leading to hallucination and poor retrieval coherence. This paper proposes KG²RAG, the first RAG framework to integrate knowledge graphs (KGs) into the retrieval-augmented generation pipeline: it first retrieves initial text chunks via semantic search, then constructs a KG capturing factual relations among them, and finally performs paragraph-level expansion and structured organization guided by the graph topology. By explicitly modeling logical knowledge dependencies, KG²RAG transcends the conventional paradigm of isolated chunk retrieval. Evaluated on HotpotQA and its variants, KG²RAG achieves significant improvements in response quality (F1 +4.2%) and retrieval relevance (Recall@5 +7.1%), demonstrating that KG-guided retrieval effectively enhances diversity, generation coherence, and hallucination mitigation.
📝 Abstract
Retrieval-augmented generation (RAG) has emerged as a promising technology for addressing hallucination issues in the responses generated by large language models (LLMs). Existing studies on RAG primarily focus on applying semantic-based approaches to retrieve isolated relevant chunks, which ignore their intrinsic relationships. In this paper, we propose a novel Knowledge Graph-Guided Retrieval Augmented Generation (KG$^2$RAG) framework that utilizes knowledge graphs (KGs) to provide fact-level relationships between chunks, improving the diversity and coherence of the retrieved results. Specifically, after performing a semantic-based retrieval to provide seed chunks, KG$^2$RAG employs a KG-guided chunk expansion process and a KG-based chunk organization process to deliver relevant and important knowledge in well-organized paragraphs. Extensive experiments conducted on the HotpotQA dataset and its variants demonstrate the advantages of KG$^2$RAG compared to existing RAG-based approaches, in terms of both response quality and retrieval quality.