🤖 AI Summary
Graph Neural Networks (GNNs) achieve strong predictive performance but suffer from poor confidence calibration—particularly problematic in high-stakes applications like fraud detection, where unreliable uncertainty estimates hinder trustworthy decision-making. Existing calibration methods exhibit weak generalization, yielding unstable or even degraded calibration across node subgroups differing in degree, class, or local graph structure.
Method: We propose an adaptive cross-subgroup calibration framework featuring (i) an adversarial mechanism to automatically identify miscalibrated subgroups without prior assumptions, and (ii) a differentiable Group Expected Calibration Error (Group ECE) loss that dynamically optimizes confidence estimation per subgroup.
Contribution/Results: Evaluated on multiple real-world graph datasets, our method significantly improves both global calibration and subgroup-specific calibration across feature-, topology-, and connectivity-based dimensions. It consistently outperforms state-of-the-art baselines, demonstrating superior reliability and practicality for safety-critical graph learning applications.
📝 Abstract
Despite their impressive predictive performance, GNNs often exhibit poor confidence calibration, i.e., their predicted confidence scores do not accurately reflect true correctness likelihood. This issue raises concerns about their reliability in high-stakes domains such as fraud detection, and risk assessment, where well-calibrated predictions are essential for decision-making. To ensure trustworthy predictions, several GNN calibration methods are proposed. Though they can improve global calibration, our experiments reveal that they often fail to generalize across different node groups, leading to inaccurate confidence in node groups with different degree levels, classes, and local structures. In certain cases, they even degrade calibration compared to the original uncalibrated GNN. To address this challenge, we propose a novel AdvCali framework that adaptively enhances calibration across different node groups. Our method leverages adversarial training to automatically identify mis-calibrated node groups and applies a differentiable Group Expected Calibration Error (ECE) loss term to refine confidence estimation within these groups. This allows the model to dynamically adjust its calibration strategy without relying on dataset-specific prior knowledge about miscalibrated subgroups. Extensive experiments on real-world datasets demonstrate that our approach not only improves global calibration but also significantly enhances calibration within groups defined by feature similarity, topology, and connectivity, outperforming previous methods and demonstrating its effectiveness in practical scenarios.