๐ค AI Summary
This study addresses a critical structural privacy vulnerability in Graph RAG systems, wherein the exposure of structured knowledge enables adversaries to reconstruct the underlying knowledge graph. The work presents the first systematic disclosure of this risk and introduces an adaptive black-box graph reconstruction framework. This framework integrates Depth-Wise Heuristic Search for recursive extraction of entity attributes and Breadth-Wise Diffusion Search for cross-relational inference of topological structure. Evaluated in both general and medical domains, the proposed method successfully recovers over 90% of the original knowledge graphs with high fidelity, accurately reconstructing sensitive entities, their relationships, and structural dependenciesโthereby circumventing existing defense mechanisms.
๐ Abstract
Retrieval-Augmented Generation (RAG) enhances LLMs by grounding generation in query-relevant external evidence. Beyond unstructured text corpora, Graph RAG integrates knowledge graphs into the retrieval pipeline, enabling LLMs to access entities, relations, and multi-hop dependencies encoded in structured knowledge. However, the same structured knowledge that empowers Graph RAG also creates a new privacy attack surface. We demonstrate that Graph RAG systems can be turned into structural oracles: through adaptive black-box interactions, an adversary can elicit sufficient relational evidence to reconstruct substantial portions of the hidden knowledge graph. We propose a structure-oriented reconstruction framework that recovers targeted graphs from both local and global perspectives. Specifically, Depth-Wise Heuristic Search extracts fine-grained node attributes by recursively expanding entity-centered evidence, while Breadth-Wise Diffusion Search infers graph topology by propagating across relation-induced neighborhoods. Experiments on generic and healthcare scenarios demonstrate that our method can recover over 90\% of the original knowledge graph from representative Graph RAG systems, revealing sensitive entities, relations, and structural dependencies with high fidelity. Existing guradrails provide limited defense against our attack, highlighting the inherent difficulty of safeguarding structural privacy in Graph RAG pipelines.