The Confidence Trap: Calibration Attacks for Graph Neural Networks

📅 2026-06-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the underexplored robustness of graph neural networks (GNNs) in confidence calibration under structural perturbations, noting that existing adversarial attacks struggle to degrade calibration without altering prediction labels. To bridge this gap, the paper introduces the Unified Graph Calibration Attack (UGCA) framework—the first adversarial paradigm specifically designed to target GNN calibration. UGCA drives the predicted distribution toward uniformity via KL divergence, integrating a re-ranking mechanism, hybrid loss, and beam search strategy to perform worst-case analysis under white-box settings. Both theoretical analysis and extensive experiments demonstrate that UGCA substantially increases expected calibration error while preserving classification accuracy, thereby uncovering intrinsic relationships among model accuracy, dataset complexity, and calibration vulnerability, and highlighting the sensitivity of current GNN calibration methods to structural perturbations.

📝 Abstract

While confidence calibration is essential for trustworthy decision-making in safety-critical applications, the robustness of calibrated GNNs to adversarial structural perturbations remains largely unexplored. However, studying calibration attacks on graphs presents unique technical challenges: (1) the discrete nature of graph structures complicates gradient-based optimization, (2) existing underconfidence objectives fail to drive predictions toward uniform distributions, and (3) GNNs are highly sensitive to edge perturbations, often causing unintended label changes that violate attack constraints. To address these challenges, we propose a \textbf{Unified Graph Calibration Attack (UGCA)} framework designed for \textbf{worst-case (white-box) analysis} of GNN calibration robustness. UGCA introduces a KL-divergence loss to encourage uniform predictive distributions, a reranking mechanism to reduce label flipping, a hybrid loss to recover labels when violations occur, and beam search to explore a broader adversarial search space. We further provide theoretical insights linking model generalization, dataset complexity, and calibration vulnerability, showing that models with higher accuracy or trained on datasets with more classes are more susceptible under this threat model. Extensive experiments demonstrate that UGCA substantially increases Expected Calibration Error while preserving classification accuracy. Our code is publicly available at https://github.com/CaptainCuong/Graph-Calibration-Attack.git.

Problem

Research questions and friction points this paper is trying to address.

confidence calibration

graph neural networks

adversarial attacks

structural perturbations

calibration robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph Neural Networks

Calibration Attack

Adversarial Perturbation