π€ AI Summary
Existing graph Transformers (GTs) exhibit poor generalization under distributional shifts. To address this, we propose the first framework that deeply integrates graph invariant learning with the Transformer architecture to enhance out-of-distribution (OOD) generalization. Our method introduces three key innovations: (1) an entropy-guided invariant subgraph disentangler that explicitly isolates predictive subgraph structures; (2) an evolutionary subgraph positional and structural encoder (PSE) that dynamically models geometric and topological invariances across subgraphs; and (3) an invariant risk minimization (IRM)-based representation learning module that enforces causal invariance constraints on both attention mechanisms and feature mappings. We provide theoretical analysis establishing a generalization bound under distributional shift. Extensive experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods, validating the modelβs strong robustness and transferability to unseen graph structures.
π Abstract
Graph Transformers (GTs) have demonstrated great effectiveness across various graph analytical tasks. However, the existing GTs focus on training and testing graph data originated from the same distribution, but fail to generalize under distribution shifts. Graph invariant learning, aiming to capture generalizable graph structural patterns with labels under distribution shifts, is potentially a promising solution, but how to design attention mechanisms and positional and structural encodings (PSEs) based on graph invariant learning principles remains challenging. To solve these challenges, we introduce Graph Out-Of-Distribution generalized Transformer (GOODFormer), aiming to learn generalized graph representations by capturing invariant relationships between predictive graph structures and labels through jointly optimizing three modules. Specifically, we first develop a GT-based entropy-guided invariant subgraph disentangler to separate invariant and variant subgraphs while preserving the sharpness of the attention function. Next, we design an evolving subgraph positional and structural encoder to effectively and efficiently capture the encoding information of dynamically changing subgraphs during training. Finally, we propose an invariant learning module utilizing subgraph node representations and encodings to derive generalizable graph representations that can to unseen graphs. We also provide theoretical justifications for our method. Extensive experiments on benchmark datasets demonstrate the superiority of our method over state-of-the-art baselines under distribution shifts.