🤖 AI Summary
This study exposes a critical security vulnerability in ResNet-50 masked by high nominal accuracy: for the first time, it jointly models the synergistic effect of FGSM-based adversarial perturbations and imperceptible binary payload injection. Experiments show that while the model achieves 53.33% accuracy on clean samples, its accuracy remains unchanged under FGSM attack—yet erroneous predictions exhibit significantly inflated confidence scores. Payload injection succeeds at 93.33% with no perceptible visual degradation. Through feature-space manipulation and statistical analysis of confidence distributions, the work demonstrates that high accuracy does not imply high decision trustworthiness. It introduces a novel “trustworthiness–accuracy decoupling” security evaluation paradigm, empirically revealing the dissociation between predictive correctness and epistemic reliability. This provides foundational evidence and methodological guidance for advancing robustness assessment and trustworthy AI in deep learning systems.
📝 Abstract
This paper investigates the resilience of a ResNet-50 image classification model under two prominent security threats: Fast Gradient Sign Method (FGSM) adversarial attacks and malicious payload injection. Initially, the model attains a 53.33% accuracy on clean images. When subjected to FGSM perturbations, its overall accuracy remains unchanged; however, the model's confidence in incorrect predictions notably increases. Concurrently, a payload injection scheme is successfully executed in 93.33% of the tested samples, revealing how stealthy attacks can manipulate model predictions without degrading visual quality. These findings underscore the vulnerability of even high-performing neural networks and highlight the urgency of developing more robust defense mechanisms for security-critical applications.