Efficient Knowledge Graph Construction and Retrieval from Unstructured Text for Large-Scale RAG Systems

๐Ÿ“… 2025-07-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the high computational cost, graph retrieval latency, and scalability bottlenecks of GraphRAG systems in enterprise-scale unstructured text, this paper proposes a lightweight, LLM-free graph-augmented generation framework. Methodologically, it introduces a dependency-parsingโ€“based knowledge graph construction pipeline leveraging industrial-grade NLP tools for efficient entity and relation extraction; further, it employs hybrid query node identification and single-hop traversal to enable low-latency subgraph retrieval. The key innovation lies in decoupling both graph construction and retrieval from LLM dependence while preserving multi-hop reasoning capability and achieving high recall. Evaluated on the SAP dataset, the framework improves LLM-as-Judge evaluation scores by 15% over conventional RAG, reduces graph construction cost significantly, and attains 94% of the retrieval performance of LLM-based baselines.

Technology Category

Application Category

๐Ÿ“ Abstract
We propose a scalable and cost-efficient framework for deploying Graph-based Retrieval Augmented Generation (GraphRAG) in enterprise environments. While GraphRAG has shown promise for multi-hop reasoning and structured retrieval, its adoption has been limited by the high computational cost of constructing knowledge graphs using large language models (LLMs) and the latency of graph-based retrieval. To address these challenges, we introduce two core innovations: (1) a dependency-based knowledge graph construction pipeline that leverages industrial-grade NLP libraries to extract entities and relations from unstructured text completely eliminating reliance on LLMs; and (2) a lightweight graph retrieval strategy that combines hybrid query node identification with efficient one-hop traversal for high-recall, low-latency subgraph extraction. We evaluate our framework on two SAP datasets focused on legacy code migration and demonstrate strong empirical performance. Our system achieves up to 15% and 4.35% improvements over traditional RAG baselines based on LLM-as-Judge and RAGAS metrics, respectively. Moreover, our dependency-based construction approach attains 94% of the performance of LLM-generated knowledge graphs (61.87% vs. 65.83%) while significantly reducing cost and improving scalability. These results validate the feasibility of deploying GraphRAG systems in real-world, large-scale enterprise applications without incurring prohibitive resource requirements paving the way for practical, explainable, and domain-adaptable retrieval-augmented reasoning.
Problem

Research questions and friction points this paper is trying to address.

High computational cost of LLM-based knowledge graph construction
Latency issues in graph-based retrieval systems
Scalability challenges in enterprise GraphRAG deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dependency-based knowledge graph construction without LLMs
Lightweight graph retrieval with hybrid query node identification
Efficient one-hop traversal for subgraph extraction
๐Ÿ”Ž Similar Papers
No similar papers found.
C
Congmin Min
SAP, Palo Alto, California, USA
R
Rhea Mathew
SAP, Palo Alto, California, USA
J
Joyce Pan
SAP, Palo Alto, California, USA
Sahil Bansal
Sahil Bansal
Senior Data Scientist at SAP
Machine LearningNatural Language ProcessingInformation RetrievalKnowledge Graphs
A
Abbas Keshavarzi
SAP, Palo Alto, California, USA
A
Amar Viswanathan Kannan
SAP, Palo Alto, California, USA