🤖 AI Summary
The long-term effectiveness of backdoor attacks in continual learning has not been systematically investigated. Method: This paper first uncovers the persistence mechanism of such attacks and proposes two lightweight persistent backdoor paradigms: blind-task backdoors—achieved via loss-function perturbation—and latent-task backdoors—requiring contamination of training data from only a single task. Neither paradigm necessitates full control over the training process; both entail minimal intervention, enhancing stealthiness and cross-algorithm generalizability. The methods are compatible with mainstream continual learning algorithms (e.g., EWC, LwF, iCaRL) and support diverse triggers—including static, dynamic, physical, and semantic variants. Results: Experiments across multiple continual learning settings demonstrate an average attack success rate exceeding 92%, with robust evasion of state-of-the-art defenses such as SentiNet and I-BAU, validating strong robustness and practical viability.
📝 Abstract
Backdoor attacks pose a significant threat to neural networks, enabling adversaries to manipulate model outputs on specific inputs, often with devastating consequences, especially in critical applications. While backdoor attacks have been studied in various contexts, little attention has been given to their practicality and persistence in continual learning, particularly in understanding how the continual updates to model parameters, as new data distributions are learned and integrated, impact the effectiveness of these attacks over time. To address this gap, we introduce two persistent backdoor attacks-Blind Task Backdoor and Latent Task Backdoor-each leveraging minimal adversarial influence. Our blind task backdoor subtly alters the loss computation without direct control over the training process, while the latent task backdoor influences only a single task's training, with all other tasks trained benignly. We evaluate these attacks under various configurations, demonstrating their efficacy with static, dynamic, physical, and semantic triggers. Our results show that both attacks consistently achieve high success rates across different continual learning algorithms, while effectively evading state-of-the-art defenses, such as SentiNet and I-BAU.