🤖 AI Summary
To address the lack of theoretical guarantees for exactness in subgraph matching for graph data management, this paper proposes GNN-AE—the first verifiably exact GNN-based embedding framework. Its core innovation introduces *anchors*, *anchor graphs*, and *anchor paths*, reformulating exact subgraph matching as a search for anchor structures within the embedding space. We design a GNN embedding mechanism with theoretically proven completeness (i.e., zero false negatives) and integrate it with a matching-growth algorithm and cost-driven depth-first search query optimization. Experiments on six real-world and three synthetic datasets demonstrate that GNN-AE achieves 100% recall—significantly outperforming state-of-the-art approximate methods—while delivering up to 8.2× speedup in end-to-end matching latency. To our knowledge, this is the first approach to achieve both rigorous theoretical correctness and practical efficiency for exact subgraph matching.
📝 Abstract
Subgraph matching query is a classic problem in graph data management and has a variety of real-world applications, such as discovering structures in biological or chemical networks, finding communities in social network analysis, explaining neural networks, and so on. To further solve the subgraph matching problem, several recent advanced works attempt to utilize deep-learning-based techniques to handle the subgraph matching query. However, most of these works only obtain approximate results for subgraph matching without theoretical guarantees of accuracy. In this paper, we propose a novel and effective graph neural network (GNN)-based anchor embedding framework (GNN-AE), which allows exact subgraph matching. Unlike GNN-based approximate subgraph matching approaches that only produce inexact results, in this paper, we pioneer a series of concepts related to anchor (including anchor, anchor graph/path, etc.) in subgraph matching and carefully devise the anchor (graph) embedding technique based on GNN models. We transform the subgraph matching problem into a search problem in the embedding space via the anchor (graph&path) embedding techniques. With the proposed anchor matching mechanism, GNN-AE can guarantee subgraph matching has no false dismissals. We design an efficient matching growth algorithm, which can retrieve the locations of all exact matches in parallel. We also propose a cost-model-based DFS query plan to enhance the parallel matching growth algorithm. Through extensive experiments on 6 real-world and 3 synthetic datasets, we confirm the effectiveness and efficiency of our GNN-AE approach for exact subgraph matching.