ProARD: progressive adversarial robustness distillation: provide wide range of robust students

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost and carbon emissions incurred by repeatedly training lightweight models for diverse edge devices under varying resource constraints, this paper proposes a progressive adversarial robustness distillation framework. Methodologically, it introduces: (1) a dynamically configurable network architecture—jointly adjustable in width, depth, and expansion ratio—enabling a single training run to produce a spectrum of student models; (2) a robustness-oriented gradient approximation sampling mechanism that mitigates performance degradation caused by naive random sampling; and (3) an integrated optimization strategy combining weight sharing and adversarial distillation. Evaluated on CIFAR-10 and CIFAR-100, the method achieves multi-scale robust model generation from one training run, preserving clean accuracy while significantly improving PGD robustness. It reduces training energy consumption by 63% and substantially lowers CO₂ emissions.

Technology Category

Application Category

📝 Abstract
Adversarial Robustness Distillation (ARD) has emerged as an effective method to enhance the robustness of lightweight deep neural networks against adversarial attacks. Current ARD approaches have leveraged a large robust teacher network to train one robust lightweight student. However, due to the diverse range of edge devices and resource constraints, current approaches require training a new student network from scratch to meet specific constraints, leading to substantial computational costs and increased CO2 emissions. This paper proposes Progressive Adversarial Robustness Distillation (ProARD), enabling the efficient one-time training of a dynamic network that supports a diverse range of accurate and robust student networks without requiring retraining. We first make a dynamic deep neural network based on dynamic layers by encompassing variations in width, depth, and expansion in each design stage to support a wide range of architectures. Then, we consider the student network with the largest size as the dynamic teacher network. ProARD trains this dynamic network using a weight-sharing mechanism to jointly optimize the dynamic teacher network and its internal student networks. However, due to the high computational cost of calculating exact gradients for all the students within the dynamic network, a sampling mechanism is required to select a subset of students. We show that random student sampling in each iteration fails to produce accurate and robust students.
Problem

Research questions and friction points this paper is trying to address.

Enhance robustness of lightweight networks against adversarial attacks
Reduce computational costs of training diverse student networks
Optimize dynamic teacher and student networks jointly
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic network supports diverse student architectures
Weight-sharing optimizes teacher and student jointly
Sampling mechanism selects students efficiently
🔎 Similar Papers
No similar papers found.
S
Seyedhamidreza Mousavi
Mälardalen University
S
Seyedali Mousavi
Mälardalen University
Masoud Daneshtalab
Masoud Daneshtalab
Professor and Head of DeepHERO Lab.
Deep LearningHeterogeneous and Dependable ComputingInterconnection Networks