🤖 AI Summary
To address the limitation in retrieval-augmented generation (RAG) where insufficient contextual information degrades query retrieval performance, this paper proposes KG-CQR—a knowledge graph (KG)-enhanced query rewriting framework. At the query level, it incorporates structured relational representations from a corpus-centric KG, automatically extracting and completing query-relevant subgraphs to generate structure-aware, context-enhanced query representations. KG-CQR is model-agnostic, requires no fine-tuning, and is both lightweight and broadly applicable. Its key innovation lies in explicitly integrating KG relational modeling into the conversational query rewriting (CQR) pipeline to capture multi-hop semantic dependencies. Experiments on RAGBench and MultiHop-RAG demonstrate consistent improvements: +4–6% in mean average precision (mAP) and +2–3% in Recall@25 over state-of-the-art baselines—particularly benefiting complex, multi-hop question answering tasks.
📝 Abstract
The integration of knowledge graphs (KGs) with large language models (LLMs) offers significant potential to improve the retrieval phase of retrieval-augmented generation (RAG) systems. In this study, we propose KG-CQR, a novel framework for Contextual Query Retrieval (CQR) that enhances the retrieval phase by enriching the contextual representation of complex input queries using a corpus-centric KG. Unlike existing methods that primarily address corpus-level context loss, KG-CQR focuses on query enrichment through structured relation representations, extracting and completing relevant KG subgraphs to generate semantically rich query contexts. Comprising subgraph extraction, completion, and contextual generation modules, KG-CQR operates as a model-agnostic pipeline, ensuring scalability across LLMs of varying sizes without additional training. Experimental results on RAGBench and MultiHop-RAG datasets demonstrate KG-CQR's superior performance, achieving a 4-6% improvement in mAP and a 2-3% improvement in Recall@25 over strong baseline models. Furthermore, evaluations on challenging RAG tasks such as multi-hop question answering show that, by incorporating KG-CQR, the performance consistently outperforms the existing baseline in terms of retrieval effectiveness