🤖 AI Summary
This work addresses the limitations of existing robustness evaluations for deep neural networks, which often rely on specific attacks and lack interpretability. The authors propose an attack-agnostic robustness metric based on the spectral norm of the Fisher information matrix, quantifying the worst-case sensitivity of a model’s output distribution to input perturbations. Theoretically, they establish—for the first time—an equivalence between the Fisher information matrix and the variance of the input Jacobian, and derive closed-form spectral bounds for common architectures, enabling theory-driven robustness ranking. Algorithmically, they combine power iteration with Hutchinson’s estimator to support efficient white-box and black-box evaluation. Experiments on CIFAR, ImageNet, and medical imaging datasets demonstrate a strong correlation between the proposed metric and adversarial vulnerability, offering an interpretable tool for model diagnosis and design.
📝 Abstract
The robustness of deep neural networks is crucial for safety-critical deployments, yet existing evaluation methods are often attack-dependent and lack interpretability. We propose a principled, attack-agnostic robustness metric based on the spectral norm of the Fisher Information Matrix (FIM), which quantifies the worst-case sensitivity of the model's output distribution to input perturbations. Theoretically, we establish that the FIM equals the variance of the input Jacobian and derive closed-form spectral bounds for common architectures, including VGG, ResNet, DenseNet, and Transformer, providing the first theoretical robustness ranking. To enable scalable evaluation, we develop efficient algorithms, including power iteration and Hutchinson-based estimation, that support both white-box and black-box settings. Extensive experiments across multiple datasets, including CIFAR, ImageNet, and medical images, and across multiple architectures show a strong correlation between our metric and adversarial vulnerability. Our framework serves as an interpretable diagnostic tool that complements attack-based evaluations, offering insights into architectural sensitivity and guiding the design of more robust models. Code is available at: https://github.com/franz-chang/SRP/.