GraphSteal: Structural Knowledge Stealing from Graph RAG via Traversal Reconstruction

๐Ÿ“… 2026-05-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

155K/year
๐Ÿค– AI Summary
This study addresses a critical structural privacy vulnerability in Graph RAG systems, wherein the exposure of structured knowledge enables adversaries to reconstruct the underlying knowledge graph. The work presents the first systematic disclosure of this risk and introduces an adaptive black-box graph reconstruction framework. This framework integrates Depth-Wise Heuristic Search for recursive extraction of entity attributes and Breadth-Wise Diffusion Search for cross-relational inference of topological structure. Evaluated in both general and medical domains, the proposed method successfully recovers over 90% of the original knowledge graphs with high fidelity, accurately reconstructing sensitive entities, their relationships, and structural dependenciesโ€”thereby circumventing existing defense mechanisms.
๐Ÿ“ Abstract
Retrieval-Augmented Generation (RAG) enhances LLMs by grounding generation in query-relevant external evidence. Beyond unstructured text corpora, Graph RAG integrates knowledge graphs into the retrieval pipeline, enabling LLMs to access entities, relations, and multi-hop dependencies encoded in structured knowledge. However, the same structured knowledge that empowers Graph RAG also creates a new privacy attack surface. We demonstrate that Graph RAG systems can be turned into structural oracles: through adaptive black-box interactions, an adversary can elicit sufficient relational evidence to reconstruct substantial portions of the hidden knowledge graph. We propose a structure-oriented reconstruction framework that recovers targeted graphs from both local and global perspectives. Specifically, Depth-Wise Heuristic Search extracts fine-grained node attributes by recursively expanding entity-centered evidence, while Breadth-Wise Diffusion Search infers graph topology by propagating across relation-induced neighborhoods. Experiments on generic and healthcare scenarios demonstrate that our method can recover over 90\% of the original knowledge graph from representative Graph RAG systems, revealing sensitive entities, relations, and structural dependencies with high fidelity. Existing guradrails provide limited defense against our attack, highlighting the inherent difficulty of safeguarding structural privacy in Graph RAG pipelines.
Problem

Research questions and friction points this paper is trying to address.

Graph RAG
knowledge graph
privacy attack
structural reconstruction
knowledge stealing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph RAG
knowledge graph reconstruction
structural privacy attack
black-box traversal
relational inference