Persistent Backdoor Attacks in Continual Learning

📅 2024-09-20

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

197K/year

🤖 AI Summary

The long-term effectiveness of backdoor attacks in continual learning has not been systematically investigated. Method: This paper first uncovers the persistence mechanism of such attacks and proposes two lightweight persistent backdoor paradigms: blind-task backdoors—achieved via loss-function perturbation—and latent-task backdoors—requiring contamination of training data from only a single task. Neither paradigm necessitates full control over the training process; both entail minimal intervention, enhancing stealthiness and cross-algorithm generalizability. The methods are compatible with mainstream continual learning algorithms (e.g., EWC, LwF, iCaRL) and support diverse triggers—including static, dynamic, physical, and semantic variants. Results: Experiments across multiple continual learning settings demonstrate an average attack success rate exceeding 92%, with robust evasion of state-of-the-art defenses such as SentiNet and I-BAU, validating strong robustness and practical viability.

Technology Category

Application Category

📝 Abstract

Backdoor attacks pose a significant threat to neural networks, enabling adversaries to manipulate model outputs on specific inputs, often with devastating consequences, especially in critical applications. While backdoor attacks have been studied in various contexts, little attention has been given to their practicality and persistence in continual learning, particularly in understanding how the continual updates to model parameters, as new data distributions are learned and integrated, impact the effectiveness of these attacks over time. To address this gap, we introduce two persistent backdoor attacks-Blind Task Backdoor and Latent Task Backdoor-each leveraging minimal adversarial influence. Our blind task backdoor subtly alters the loss computation without direct control over the training process, while the latent task backdoor influences only a single task's training, with all other tasks trained benignly. We evaluate these attacks under various configurations, demonstrating their efficacy with static, dynamic, physical, and semantic triggers. Our results show that both attacks consistently achieve high success rates across different continual learning algorithms, while effectively evading state-of-the-art defenses, such as SentiNet and I-BAU.

Problem

Research questions and friction points this paper is trying to address.

Study persistence of backdoor attacks in continual learning

Evaluate impact of model updates on attack effectiveness

Introduce blind and latent task backdoor attack methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Blind Task Backdoor alters loss computation subtly

Latent Task Backdoor affects single task training

Attacks evade SentiNet and I-BAU defenses

🔎 Similar Papers

Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning