GF-Score: Certified Class-Conditional Robustness Evaluation with Fairness Guarantees

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the limitations of existing adversarial robustness evaluation methods, which either rely on computationally expensive attacks or yield a single aggregate score that obscures inter-class disparities. The authors propose GF-Score, a novel framework that enables attack-free, class-conditional fairness assessment of certified robustness. By decomposing certified robustness into per-class profiles and integrating four welfare-economics-inspired fairness metrics—including the normalized robustness Gini coefficient and the robustness gap index—the method incorporates a self-calibration mechanism to eliminate dependence on adversarial attacks. Experiments across 22 models on CIFAR-10 and ImageNet validate the approach, revealing that highly robust models often exhibit greater inter-class inequality, with categories such as “cat” identified as the most vulnerable in 76% of evaluated models, and establishing a practical attack-free auditing pipeline.

Technology Category

Application Category

📝 Abstract

Adversarial robustness is essential for deploying neural networks in safety-critical applications, yet standard evaluation methods either require expensive adversarial attacks or report only a single aggregate score that obscures how robustness is distributed across classes. We introduce the \emph{GF-Score} (GREAT-Fairness Score), a framework that decomposes the certified GREAT Score into per-class robustness profiles and quantifies their disparity through four metrics grounded in welfare economics: the Robustness Disparity Index (RDI), the Normalized Robustness Gini Coefficient (NRGC), Worst-Case Class Robustness (WCR), and a Fairness-Penalized GREAT Score (FP-GREAT). The framework further eliminates the original method's dependence on adversarial attacks through a self-calibration procedure that tunes the temperature parameter using only clean accuracy correlations. Evaluating 22 models from RobustBench across CIFAR-10 and ImageNet, we find that the decomposition is exact, that per-class scores reveal consistent vulnerability patterns (e.g., ``cat'' is the weakest class in 76\% of CIFAR-10 models), and that more robust models tend to exhibit greater class-level disparity. These results establish a practical, attack-free auditing pipeline for diagnosing where certified robustness guarantees fail to protect all classes equally. We release our code on \href{https://github.com/aryashah2k/gf-score}{GitHub}.

Problem

Research questions and friction points this paper is trying to address.

adversarial robustness

class-conditional robustness

fairness

robustness disparity

certified robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

certified robustness

fairness

class-conditional evaluation