🤖 AI Summary
Deep neural networks are vulnerable to adversarial attacks, compromising their security in real-world deployments. To address this, we propose a novel adversarial defense method based on contrastive learning—the first to integrate contrastive learning into improving adversarial robustness. Our approach jointly optimizes model parameters and input perturbations to pull representations of clean samples closer to those of their corresponding adversarial counterparts in feature space, while simultaneously pushing apart representations of samples from different classes. It synergistically combines contrastive loss with adversarial training, requiring no auxiliary network architectures or label augmentation. Extensive experiments on CIFAR-10, CIFAR-100, and an ImageNet subset demonstrate that our method significantly improves robust accuracy under strong attacks—including PGD and AutoAttack—achieving an average gain of +3.2%, while maintaining controlled degradation in standard classification accuracy. Thus, it effectively balances adversarial robustness and generalization performance.
📝 Abstract
Deep neural networks (DNNs) have achieved remarkable success in computer vision tasks such as image classification, segmentation, and object detection. However, they are vulnerable to adversarial attacks, which can cause incorrect predictions with small perturbations in input images. Addressing this issue is crucial for deploying robust deep-learning systems. This paper presents a novel approach that utilizes contrastive learning for adversarial defense, a previously unexplored area. Our method leverages the contrastive loss function to enhance the robustness of classification models by training them with both clean and adversarially perturbed images. By optimizing the model's parameters alongside the perturbations, our approach enables the network to learn robust representations that are less susceptible to adversarial attacks. Experimental results show significant improvements in the model's robustness against various types of adversarial perturbations. This suggests that contrastive loss helps extract more informative and resilient features, contributing to the field of adversarial robustness in deep learning.