🤖 AI Summary
To address confirmation bias induced by noisy pseudo-labels in knowledge graph entity alignment, this paper proposes UPL-EA, a Unified Pseudo-Labeling framework for Entity Alignment. Methodologically, UPL-EA is the first to systematically distinguish and jointly mitigate Type I (false positive) and Type II (false negative) pseudo-label errors. It achieves this through two key components: (i) a theoretically grounded, one-to-one pseudo-label generation criterion based on cross-graph optimal transport; and (ii) a convergence-guaranteed inter-iteration pseudo-label calibration mechanism. Integrated with graph neural networks and cross-KG embedding alignment, UPL-EA significantly improves alignment accuracy under low-resource seed settings. Empirical results show a 32% reduction in pseudo-label error rate and consistent superiority over state-of-the-art methods. Crucially, UPL-EA provides formal theoretical guarantees on the convergence of its iterative pseudo-label refinement process.
📝 Abstract
Entity alignment (EA) aims at identifying equivalent entity pairs across different knowledge graphs (KGs) that refer to the same real-world identity. To systematically combat confirmation bias for pseudo-labeling-based entity alignment, we propose a Unified Pseudo-Labeling framework for Entity Alignment (UPL-EA) that explicitly eliminates pseudo-labeling errors to boost the accuracy of entity alignment. UPL-EA consists of two complementary components: (1) The Optimal Transport (OT)-based pseudo-labeling uses discrete OT modeling as an effective means to enable more accurate determination of entity correspondences across two KGs and to mitigate the adverse impact of erroneous matches. A simple but highly effective criterion is further devised to derive pseudo-labeled entity pairs that satisfy one-to-one correspondences at each iteration. (2) The cross-iteration pseudo-label calibration operates across multiple consecutive iterations to further improve the pseudo-labeling precision rate by reducing the local pseudo-label selection variability with a theoretical guarantee. The two components are respectively designed to eliminate Type I and Type II pseudo-labeling errors identified through our analyse. The calibrated pseudo-labels are thereafter used to augment prior alignment seeds to reinforce subsequent model training for alignment inference. The effectiveness of UPL-EA in eliminating pseudo-labeling errors is both theoretically supported and experimentally validated. The experimental results show that our approach achieves competitive performance with limited prior alignment seeds.