Towards Better Benchmark Datasets for Inductive Knowledge Graph Completion

📅 2024-06-14
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies a pervasive personalized PageRank (PPR) shortcut in current inductive knowledge graph completion (KGC) benchmarks: PPR—relying solely on graph structure—achieves performance close to state-of-the-art (SOTA) models, severely compromising evaluation validity. We conduct the first systematic diagnosis and pinpoint the root cause: standard data splits fail to decouple structural proximity from relational semantics. To address this, we propose a structure-debiased data construction paradigm that enforces constraints on subgraph connectivity and relation distribution to achieve structural–semantic disentanglement. Using this paradigm, we construct a new benchmark that substantially suppresses the PPR baseline—reducing its mean reciprocal rank (MRR) by 35.2% on average—while yielding more faithful model rankings (e.g., RGCN, CompGCN) aligned with their relational modeling capacity. Our benchmark advances inductive KGC evaluation toward standardization and reliability.

Technology Category

Application Category

📝 Abstract
Knowledge Graph Completion (KGC) attempts to predict missing facts in a Knowledge Graph (KG). Recently, there's been an increased focus on designing KGC methods that can excel in the {it inductive setting}, where a portion or all of the entities and relations seen in inference are unobserved during training. Numerous benchmark datasets have been proposed for inductive KGC, all of which are subsets of existing KGs used for transductive KGC. However, we find that the current procedure for constructing inductive KGC datasets inadvertently creates a shortcut that can be exploited even while disregarding the relational information. Specifically, we observe that the Personalized PageRank (PPR) score can achieve strong or near SOTA performance on most inductive datasets. In this paper, we study the root cause of this problem. Using these insights, we propose an alternative strategy for constructing inductive KGC datasets that helps mitigate the PPR shortcut. We then benchmark multiple popular methods using the newly constructed datasets and analyze their performance. The new benchmark datasets help promote a better understanding of the capabilities and challenges of inductive KGC by removing any shortcuts that obfuscate performance.
Problem

Research questions and friction points this paper is trying to address.

Current inductive KGC datasets contain exploitable shortcuts
PPR scores artificially inflate performance metrics
Need better benchmark datasets for true inductive KGC evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes PPR shortcut in inductive KGC datasets
Proposes new dataset construction strategy
Benchmarks methods on new datasets