Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment

📅 2023-07-05
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
To address confirmation bias induced by noisy pseudo-labels in knowledge graph entity alignment, this paper proposes UPL-EA, a Unified Pseudo-Labeling framework for Entity Alignment. Methodologically, UPL-EA is the first to systematically distinguish and jointly mitigate Type I (false positive) and Type II (false negative) pseudo-label errors. It achieves this through two key components: (i) a theoretically grounded, one-to-one pseudo-label generation criterion based on cross-graph optimal transport; and (ii) a convergence-guaranteed inter-iteration pseudo-label calibration mechanism. Integrated with graph neural networks and cross-KG embedding alignment, UPL-EA significantly improves alignment accuracy under low-resource seed settings. Empirical results show a 32% reduction in pseudo-label error rate and consistent superiority over state-of-the-art methods. Crucially, UPL-EA provides formal theoretical guarantees on the convergence of its iterative pseudo-label refinement process.
📝 Abstract
Entity alignment (EA) aims at identifying equivalent entity pairs across different knowledge graphs (KGs) that refer to the same real-world identity. To systematically combat confirmation bias for pseudo-labeling-based entity alignment, we propose a Unified Pseudo-Labeling framework for Entity Alignment (UPL-EA) that explicitly eliminates pseudo-labeling errors to boost the accuracy of entity alignment. UPL-EA consists of two complementary components: (1) The Optimal Transport (OT)-based pseudo-labeling uses discrete OT modeling as an effective means to enable more accurate determination of entity correspondences across two KGs and to mitigate the adverse impact of erroneous matches. A simple but highly effective criterion is further devised to derive pseudo-labeled entity pairs that satisfy one-to-one correspondences at each iteration. (2) The cross-iteration pseudo-label calibration operates across multiple consecutive iterations to further improve the pseudo-labeling precision rate by reducing the local pseudo-label selection variability with a theoretical guarantee. The two components are respectively designed to eliminate Type I and Type II pseudo-labeling errors identified through our analyse. The calibrated pseudo-labels are thereafter used to augment prior alignment seeds to reinforce subsequent model training for alignment inference. The effectiveness of UPL-EA in eliminating pseudo-labeling errors is both theoretically supported and experimentally validated. The experimental results show that our approach achieves competitive performance with limited prior alignment seeds.
Problem

Research questions and friction points this paper is trying to address.

Combat confirmation bias in entity alignment
Enhance pseudo-labeling accuracy across knowledge graphs
Reduce Type I and II pseudo-labeling errors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified Pseudo-Labeling for Entity Alignment
Optimal Transport-based pseudo-labeling
Cross-iteration pseudo-label calibration
Q
Qijie Ding
Discipline of Business Analytics, The University of Sydney, Sydney, NSW, Australia
J
Jie Yin
Discipline of Business Analytics, The University of Sydney, Sydney, NSW, Australia
Daokun Zhang
Daokun Zhang
University of Nottingham Ningbo China
Graph LearningData MiningMachine Learning
J
Junbin Gao
Discipline of Business Analytics, The University of Sydney, Sydney, NSW, Australia