🤖 AI Summary
This work addresses the underexplored vulnerability of backbone models in visual prompt learning to stealthy backdoor attacks. We propose BadBone, the first task-adaptive, highly covert backdoor attack specifically designed for prompt-based learning scenarios. BadBone employs a bilevel optimization framework to implant a backdoor into the backbone model during pretraining, which is only activated when prompt learning is applied to downstream tasks. Extensive experiments demonstrate that BadBone achieves high attack success rates across three mainstream vision models and diverse cross-domain datasets, while preserving both pretraining and downstream task performance. Notably, it effectively evades six state-of-the-art model-level defense mechanisms, thereby overcoming key limitations of conventional backdoor attacks in terms of generalizability and detectability.
📝 Abstract
Prompt learning is a new machine learning paradigm that has attracted ample attention due to its simplicity and proven efficacy. Despite its growing adoption, the security vulnerabilities associated with this paradigm remain underexplored. In this work, we take the first step to propose BadBone, a stealthy and adaptive backdoor attack against prompt learning using bi-level optimization. Instead of backdooring the prompt learning process, we aim to compromise a backbone model such that only target downstream tasks employing prompt learning inherit the backdoor vulnerability. Extensive experiments on three different models and three datasets from various domains show that our targeted/untargeted backdoored models achieve high attack performance while maintaining utility on both pre-training and downstream tasks. Moreover, we evaluate our approach against six state-of-the-art model-level defenses, including Neural Cleanse, ABS, MNTD, NAD, CLP, and D-BR. The results demonstrate that these defenses are largely ineffective against our backdoored models and thus leave the effective defense as an important direction for future work.