MOREL: Enhancing Adversarial Robustness through Multi-Objective Representation Learning

📅 2024-10-02
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Deep neural networks are vulnerable to adversarial perturbations; existing robustness methods often rely on architectural modifications or test-time input purification, limiting their generalizability and practicality. To address this, we propose a model-agnostic, test-time intervention-free multi-objective representation learning framework. Our approach is the first to jointly integrate multi-positive contrastive loss with cosine similarity loss in adversarial robustness training, simultaneously optimizing feature alignment and classification accuracy. This encourages natural samples and their corresponding adversarial counterparts to form intra-class compact clusters in the embedding space. Extensive experiments demonstrate that our method significantly improves robustness against both white-box and black-box attacks, outperforming existing architecture-agnostic alternatives. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Extensive research has shown that deep neural networks (DNNs) are vulnerable to slight adversarial perturbations$-$small changes to the input data that appear insignificant but cause the model to produce drastically different outputs. In addition to augmenting training data with adversarial examples generated from a specific attack method, most of the current defense strategies necessitate modifying the original model architecture components to improve robustness or performing test-time data purification to handle adversarial attacks. In this work, we demonstrate that strong feature representation learning during training can significantly enhance the original model's robustness. We propose MOREL, a multi-objective feature representation learning approach, encouraging classification models to produce similar features for inputs within the same class, despite perturbations. Our training method involves an embedding space where cosine similarity loss and multi-positive contrastive loss are used to align natural and adversarial features from the model encoder and ensure tight clustering. Concurrently, the classifier is motivated to achieve accurate predictions. Through extensive experiments, we demonstrate that our approach significantly enhances the robustness of DNNs against white-box and black-box adversarial attacks, outperforming other methods that similarly require no architectural changes or test-time data purification. Our code is available at https://github.com/salomonhotegni/MOREL
Problem

Research questions and friction points this paper is trying to address.

Improving DNN robustness against adversarial perturbations
Aligning natural and adversarial features during training
Enhancing robustness against white-box and black-box attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-objective representation learning for robustness
Aligns natural and adversarial features using cosine similarity
Employs multi-positive contrastive losses for same-class inputs
🔎 Similar Papers
No similar papers found.