🤖 AI Summary
While graph models exhibit similar aggregate performance, their node-level behaviors differ significantly—yet conventional metrics (e.g., accuracy) obscure fine-grained failure modes.
Method: We propose NodePro, the first instance-level node profiling framework for graph data, which jointly integrates data-centric signals (feature similarity, label uncertainty, structural ambiguity) and model-centric signals (prediction confidence, training consistency) to quantify per-node reliability and attribute behavioral patterns. NodePro requires no retraining and generalizes to unseen nodes.
Contribution/Results: NodePro uncovers systematic inter-model differences masked by aggregate metrics, effectively distinguishing models with comparable overall accuracy but divergent failure modes. Empirical evaluation demonstrates its efficacy in precisely identifying anomalous and semantically conflicting nodes in knowledge graphs—enabling fine-grained model diagnosis and trustworthy graph learning.
📝 Abstract
Graph machine learning models often achieve similar overall performance yet behave differently at the node level, failing on different subsets of nodes with varying reliability. Standard evaluation metrics such as accuracy obscure these fine grained differences, making it difficult to diagnose when and where models fail. We introduce NodePro, a node profiling framework that enables fine-grained diagnosis of model behavior by assigning interpretable profile scores to individual nodes. These scores combine data-centric signals, such as feature dissimilarity, label uncertainty, and structural ambiguity, with model-centric measures of prediction confidence and consistency during training. By aligning model behavior with these profiles, NodePro reveals systematic differences between models, even when aggregate metrics are indistinguishable. We show that node profiles generalize to unseen nodes, supporting prediction reliability without ground-truth labels. Finally, we demonstrate the utility of NodePro in identifying semantically inconsistent or corrupted nodes in a structured knowledge graph, illustrating its effectiveness in real-world settings.