๐ค AI Summary
In graph out-of-distribution (OOD) generalization, invariant representations are often compromised by spurious correlations. To address this, we propose Redundancy-Informed Graph (RIG), a multi-level optimization framework guided by redundant information. RIG is the first to introduce Partial Information Decomposition (PID) into graph representation learning: it explicitly disentangles causal subgraphs from spurious subgraphs via information-theoretic lower-bound estimation and alternating optimization, while maximizing task-relevant redundant information to enhance cross-distribution robustness. This overcomes the fundamental limitation of conventional mutual information-based measuresโwhich struggle to decouple causal and non-causal components on structured graph data. Extensive experiments on multiple synthetic and real-world graph benchmarks demonstrate that RIG significantly improves OOD generalization performance, maintaining consistent superiority under diverse distribution shifts, including topological, node-attribute, and edge-distribution variations.
๐ Abstract
Learning invariant graph representations for out-of-distribution (OOD) generalization remains challenging because the learned representations often retain spurious components. To address this challenge, this work introduces a new tool from information theory called Partial Information Decomposition (PID) that goes beyond classical information-theoretic measures. We identify limitations in existing approaches for invariant representation learning that solely rely on classical information-theoretic measures, motivating the need to precisely focus on redundant information about the target $Y$ shared between spurious subgraphs $G_s$ and invariant subgraphs $G_c$ obtained via PID. Next, we propose a new multi-level optimization framework that we call -- Redundancy-guided Invariant Graph learning (RIG) -- that maximizes redundant information while isolating spurious and causal subgraphs, enabling OOD generalization under diverse distribution shifts. Our approach relies on alternating between estimating a lower bound of redundant information (which itself requires an optimization) and maximizing it along with additional objectives. Experiments on both synthetic and real-world graph datasets demonstrate the generalization capabilities of our proposed RIG framework.