Link Prediction or Perdition: the Seeds of Instability in Knowledge Graph Embeddings

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This study addresses the high sensitivity of knowledge graph embedding models to random seeds in link prediction, which leads to unstable triple-level predictions and inconsistent embedding space structures—a critical issue overlooked by current evaluation protocols. The authors present the first systematic quantification of how stochastic factors—including initialization, triple ordering, negative sampling, Dropout, and hardware variations—affect model stability. Through controlled ablation experiments with mainstream models such as TransE and RotatE, evaluated via MRR and Hits@K metrics, they demonstrate that high MRR performance does not guarantee high stability. Furthermore, conventional voting mechanisms fail to substantially improve prediction consistency. These findings expose significant flaws in existing benchmarking practices and raise serious concerns about the reliability of current approaches for real-world knowledge graph completion tasks.

📝 Abstract

Embedding models (KGEMs) constitute the main link prediction approach to complete knowledge graphs. Standard evaluation protocols emphasize rank-based metrics such as MRR or Hits@$K$, but usually overlook the influence of random seeds on result stability. Moreover, these metrics conceal potential instabilities in individual predictions and in the organization of embedding spaces. In this work, we conduct a systematic stability analysis of multiple KGEMs across several datasets. We find that high-performance models actually produce divergent predictions at the triple level and highly variable embedding spaces. By isolating stochastic factors (i.e., initialization, triple ordering, negative sampling, dropout, hardware), we show that each independently induces instability of comparable magnitude. Furthermore, for a given model, hyperparameter configurations with better MRR are not guaranteed to be more stable. Moreover, voting, albeit a known remediation mechanism, only provides a limited enhancement of stability. These findings highlight critical limitations of current benchmarking protocols, and raise concerns about the reliability of KGEMs for knowledge graph completion.

Problem

Research questions and friction points this paper is trying to address.

Link Prediction

Knowledge Graph Embeddings

Stability

Random Seeds

Evaluation Protocols

Innovation

Methods, ideas, or system contributions that make the work stand out.

knowledge graph embeddings

link prediction

stability analysis