Efficient Knowledge Graph Construction and Retrieval from Unstructured Text for Large-Scale RAG Systems

📅 2025-07-03

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

To address the high computational cost, graph retrieval latency, and scalability bottlenecks of GraphRAG systems in enterprise-scale unstructured text, this paper proposes a lightweight, LLM-free graph-augmented generation framework. Methodologically, it introduces a dependency-parsing–based knowledge graph construction pipeline leveraging industrial-grade NLP tools for efficient entity and relation extraction; further, it employs hybrid query node identification and single-hop traversal to enable low-latency subgraph retrieval. The key innovation lies in decoupling both graph construction and retrieval from LLM dependence while preserving multi-hop reasoning capability and achieving high recall. Evaluated on the SAP dataset, the framework improves LLM-as-Judge evaluation scores by 15% over conventional RAG, reduces graph construction cost significantly, and attains 94% of the retrieval performance of LLM-based baselines.

Technology Category

Application Category

📝 Abstract

We propose a scalable and cost-efficient framework for deploying Graph-based Retrieval Augmented Generation (GraphRAG) in enterprise environments. While GraphRAG has shown promise for multi-hop reasoning and structured retrieval, its adoption has been limited by the high computational cost of constructing knowledge graphs using large language models (LLMs) and the latency of graph-based retrieval. To address these challenges, we introduce two core innovations: (1) a dependency-based knowledge graph construction pipeline that leverages industrial-grade NLP libraries to extract entities and relations from unstructured text completely eliminating reliance on LLMs; and (2) a lightweight graph retrieval strategy that combines hybrid query node identification with efficient one-hop traversal for high-recall, low-latency subgraph extraction. We evaluate our framework on two SAP datasets focused on legacy code migration and demonstrate strong empirical performance. Our system achieves up to 15% and 4.35% improvements over traditional RAG baselines based on LLM-as-Judge and RAGAS metrics, respectively. Moreover, our dependency-based construction approach attains 94% of the performance of LLM-generated knowledge graphs (61.87% vs. 65.83%) while significantly reducing cost and improving scalability. These results validate the feasibility of deploying GraphRAG systems in real-world, large-scale enterprise applications without incurring prohibitive resource requirements paving the way for practical, explainable, and domain-adaptable retrieval-augmented reasoning.

Problem

Research questions and friction points this paper is trying to address.

High computational cost of LLM-based knowledge graph construction

Latency issues in graph-based retrieval systems

Scalability challenges in enterprise GraphRAG deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dependency-based knowledge graph construction without LLMs

Lightweight graph retrieval with hybrid query node identification

Efficient one-hop traversal for subgraph extraction

🔎 Similar Papers

No similar papers found.