Understanding the Generalizability of Link Predictors Under Distribution Shifts on Graphs

📅 2024-06-13

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

State-of-the-art link prediction (LP) models achieve strong performance on standard i.i.d. benchmarks, yet their underlying i.i.d. assumption rarely holds in practice—new links often emerge from subgraph structures whose distribution differs significantly from that of training data. Method: We formally define and construct *link-level distribution shift*, proposing a controllable non-i.i.d. data partitioning strategy grounded in graph structural properties (e.g., degree distribution, clustering coefficient, path length). We further design a structure-driven LP generalization evaluation framework integrating invariant learning and structural regularization. Contribution/Results: We uncover a counterintuitive phenomenon: mainstream LP models (GNNs, MLPs, KG embeddings) exhibit substantially worse generalization under distribution shift than simple heuristic baselines. Empirical evaluation shows distribution shift degrades SOTA model AUC by 12.7% on average. We release LPStructGen—the first dedicated benchmark for LP generalization—and a unified experimental framework.

Technology Category

Application Category

📝 Abstract

Recently, multiple models proposed for link prediction (LP) demonstrate impressive results on benchmark datasets. However, many popular benchmark datasets often assume that dataset samples are drawn from the same distribution (i.e., IID samples). In real-world situations, this assumption is often incorrect; since uncontrolled factors may lead train and test samples to come from separate distributions. To tackle the distribution shift problem, recent work focuses on creating datasets that feature distribution shifts and designing generalization methods that perform well on the new data. However, those studies only consider distribution shifts that affect {it node-} and {it graph-level} tasks, thus ignoring link-level tasks. Furthermore, relatively few LP generalization methods exist. To bridge this gap, we introduce a set of LP-specific data splits which utilizes structural properties to induce a controlled distribution shift. We verify the shift's effect empirically through evaluation of different SOTA LP methods and subsequently couple these methods with generalization techniques. Interestingly, LP-specific methods frequently generalize poorly relative to heuristics or basic GNN methods. Finally, this work provides analysis to uncover insights for enhancing LP generalization. Our code is available at: href{https://github.com/revolins/LPStructGen}{https://github.com/revolins/LPStructGen}

Problem

Research questions and friction points this paper is trying to address.

Addresses link predictor generalization under distribution shifts

Introduces LPShift to simulate controlled distribution shifts

Evaluates impact of graph structure on generalization methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces LPShift for controlled distribution shift

Uses structural properties for dataset splitting

Evaluates SOTA models on 16 LPShift variants

🔎 Similar Papers

No similar papers found.

Authors to Follow