🤖 AI Summary
This work addresses the limitations of conventional knowledge graph completion (KGC) model evaluation, which predominantly relies on uniform ranking metrics and fails to accommodate diverse user assessment needs. To bridge this gap, we propose PROBE-Web—the first interactive, goal-oriented evaluation system that enables flexible and fine-grained model analysis by adjusting two key perspectives: prediction sharpness and robustness to popularity bias. The system integrates four core functionalities: standard evaluation, perspective-aware assessment, interpretable case studies, and evaluation landscape exploration. Through an intuitive graphical interface, PROBE-Web supports dynamic multi-model comparison, real-time metric computation, and interactive visualization. Empirical results demonstrate that PROBE-Web effectively uncovers performance disparities among KGC models under multidimensional evaluation criteria, substantially enhancing the transparency and practical utility of model assessment.
📝 Abstract
Knowledge graph completion (KGC) models are commonly evaluated using rank-based metrics such as MRR and Hits@K, despite different users often requiring different evaluation perspectives. In this demo, we present PROBE-Web, an interactive system for probing diverse evaluation landscapes for KGC models. PROBE-Web enables users to flexibly evaluate KGC models by adjusting two critical perspectives: (P1) predictive sharpness and (P2) popularity-bias robustness. Through a user-friendly GUI, users easily evaluate multiple KGC models and analyze their strengths and weaknesses. PROBE-Web provides four key functionalities: (1) conventional evaluation toolkit, (2) flexible perspective-aware evaluation, (3) explainable case studies, and (4) evaluation landscape exploration. We believe that PROBE-Web can help users better understand KGC models aligning with their objectives.