Core-based Hierarchies for Efficient GraphRAG

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the instability of existing GraphRAG systems that rely on Leiden clustering, which suffers from non-reproducible community partitions due to the abundance of near-optimal solutions in modularity optimization—particularly on sparse knowledge graphs—thereby undermining global reasoning. To resolve this, we introduce k-core decomposition into GraphRAG for the first time, enabling a deterministic, density-aware hierarchical community structure. We further design a lightweight heuristic algorithm to generate connected communities with constrained sizes. Coupled with a token-budget-aware sampling strategy, our approach ensures comprehensive semantic coverage while significantly reducing large language model invocation costs. Experiments on real-world datasets—including financial reports, news articles, and podcasts—demonstrate that our method simultaneously enhances answer comprehensiveness and diversity while substantially lowering token consumption, confirming its effectiveness and efficiency.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) enhances large language models by incorporating external knowledge. However, existing vector-based methods often fail on global sensemaking tasks that require reasoning across many documents. GraphRAG addresses this by organizing documents into a knowledge graph with hierarchical communities that can be recursively summarized. Current GraphRAG approaches rely on Leiden clustering for community detection, but we prove that on sparse knowledge graphs, where average degree is constant and most nodes have low degree, modularity optimization admits exponentially many near-optimal partitions, making Leiden-based communities inherently non-reproducible. To address this, we propose replacing Leiden with k-core decomposition, which yields a deterministic, density-aware hierarchy in linear time. We introduce a set of lightweight heuristics that leverage the k-core hierarchy to construct size-bounded, connectivity-preserving communities for retrieval and summarization, along with a token-budget-aware sampling strategy that reduces LLM costs. We evaluate our methods on real-world datasets including financial earnings transcripts, news articles, and podcasts, using three LLMs for answer generation and five independent LLM judges for head-to-head evaluation. Across datasets and models, our approach consistently improves answer comprehensiveness and diversity while reducing token usage, demonstrating that k-core-based GraphRAG is an effective and efficient framework for global sensemaking.

Problem

Research questions and friction points this paper is trying to address.

Retrieval-Augmented Generation

GraphRAG

community detection

knowledge graph

global sensemaking

Innovation

Methods, ideas, or system contributions that make the work stand out.

k-core decomposition

GraphRAG

hierarchical community detection