🤖 AI Summary
This work addresses the scalability bottleneck in subgraph pattern detection within large-scale graphs, a challenge rooted in the NP-completeness of the problem, by introducing the DETR paradigm to this task for the first time. The proposed method formulates subgraph detection as a set prediction problem, leveraging a graph neural network to encode the target graph, learnable query embeddings, and a Transformer decoder to jointly predict all pattern instances in an end-to-end manner via bipartite matching loss. This framework supports both exact and approximate pattern matching, thereby overcoming the limitation of traditional approaches that are restricted to exact structural matches. Experiments demonstrate that the method efficiently detects diverse patterns of up to 50 nodes in graphs containing 1,000 nodes, achieving an AP₁₀₀ of 91.2 on functional group detection in the ChEMBL molecular dataset.
📝 Abstract
Subgraph detection seeks to identify whether and where instances of query patterns occur within a larger graph. This problem is fundamental across scientific domains and is closely related to subgraph isomorphism, which is NP-complete, limiting combinatorial approaches to small patterns or moderately sized graphs. We introduce GraphDETR, a deep learning framework that formulates subgraph detection as a set prediction problem, analogous to DETR in object detection. GraphDETR encodes the target graph with a graph neural network, and employs a fixed set of learnable query vectors, decoded via a transformer decoder, to predict all pattern occurrences jointly in a single forward pass. This is enabled by training the model end-to-end with bipartite matching. Unlike traditional combinatorial methods that only solve exact structural matching, GraphDETR naturally extends to approximate matching, enabling detection beyond exact pattern correspondence. Empirically, we show that GraphDETR can detect diverse patterns, such as molecular structures, cycles, cliques, and fuzzy patterns of up to 50 nodes, in target graphs with up to 1000 nodes. We further evaluate on molecular functional group detection over the ChEMBL dataset, where GraphDETR predicts the complete set of functional groups per molecule, achieving a strong performance of $\text{AP}_{100} = 91.2$.