CG-RAG: Research Question Answering by Citation Graph Retrieval-Augmented LLMs

📅 2025-01-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations of existing retrieval-augmented generation (RAG) methods in scientific question answering—specifically, insufficient modeling of complex citation relationships and low retrieval precision that constrain answer quality—this paper proposes a citation-graph-enhanced RAG framework. Our key contributions are threefold: (1) a novel contextualized citation graph representation that explicitly captures both semantic and structural dependencies among scholarly documents; (2) Lexical-Semantic Graph Retrieval (LeSeGR), a unified retrieval method integrating sparse and dense signals with graph neural network encoding; and (3) end-to-end context-aware alignment of graph-structured information into large language model (LLM) generation. Evaluated across multi-domain scientific QA benchmarks, our approach significantly outperforms state-of-the-art RAG baselines, achieving new SOTA results in both retrieval accuracy and answer quality. This demonstrates the critical role of graph-structured retrieval in supporting complex academic reasoning.

Technology Category

Application Category

📝 Abstract
Research question answering requires accurate retrieval and contextual understanding of scientific literature. However, current Retrieval-Augmented Generation (RAG) methods often struggle to balance complex document relationships with precise information retrieval. In this paper, we introduce Contextualized Graph Retrieval-Augmented Generation (CG-RAG), a novel framework that integrates sparse and dense retrieval signals within graph structures to enhance retrieval efficiency and subsequently improve generation quality for research question answering. First, we propose a contextual graph representation for citation graphs, effectively capturing both explicit and implicit connections within and across documents. Next, we introduce Lexical-Semantic Graph Retrieval (LeSeGR), which seamlessly integrates sparse and dense retrieval signals with graph encoding. It bridges the gap between lexical precision and semantic understanding in citation graph retrieval, demonstrating generalizability to existing graph retrieval and hybrid retrieval methods. Finally, we present a context-aware generation strategy that utilizes the retrieved graph-structured information to generate precise and contextually enriched responses using large language models (LLMs). Extensive experiments on research question answering benchmarks across multiple domains demonstrate that our CG-RAG framework significantly outperforms RAG methods combined with various state-of-the-art retrieval approaches, delivering superior retrieval accuracy and generation quality.
Problem

Research questions and friction points this paper is trying to address.

Complex Document Relationships
Information Retrieval
Answer Accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

CG-RAG
LeSeGR
Large Language Model
🔎 Similar Papers
No similar papers found.
Yuntong Hu
Yuntong Hu
Emory University
Graph Deep LearningGenerative AIData Mining
Z
Zhihan Lei
Emory University, Atlanta, GA, USA
Z
Zhongjie Dai
Tongji University, Shanghai, China
A
Allen Zhang
Georgia Institute of Technology, Atlanta, GA, USA
A
Abhinav Angirekula
University of Illinois, Urbana-Champaign, Urbana, IL, USA
Z
Zheng Zhang
Emory University, Atlanta, GA, USA
L
Liang Zhao
Emory University, Atlanta, GA, USA