TrojanEdit: Multimodal Backdoor Attack Against Image Editing Model

πŸ“… 2024-11-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing backdoor attacks on multimodal image editing diffusion models suffer from modality bias, degraded multimodal behavior, and reduced editing fidelity due to overreliance on single-modality triggers. Method: This work proposes the first covert backdoor attack paradigm requiring *cooperative activation* by both textual and visual triggers. It introduces a dynamic gradient modulation mechanism integrating multimodal alignment training, dual-modality trigger embedding, and modality-aware gradient reweighting to ensure genuine cross-modal synergy. Contribution/Results: Evaluated on multiple state-of-the-art models, the attack achieves >92% success rate while preserving original editing performance (LPIPS degradation < 0.01). Crucially, it fails completely if either trigger is absent, demonstrating strong stealth and efficacy. This work establishes a new benchmark and technical framework for security assessment of multimodal generative models.

Technology Category

Application Category

πŸ“ Abstract
Multimodal diffusion models for image editing generate outputs conditioned on both textual instructions and visual inputs, aiming to modify target regions while preserving the rest of the image. Although diffusion models have been shown to be vulnerable to backdoor attacks, existing efforts mainly focus on unimodal generative models and fail to address the unique challenges in multimodal image editing. In this paper, we present the first study of backdoor attacks on multimodal diffusion-based image editing models. We investigate the use of both textual and visual triggers to embed a backdoor that achieves high attack success rates while maintaining the model's normal functionality. However, we identify a critical modality bias. Simply combining triggers from different modalities leads the model to primarily rely on the stronger one, often the visual modality, which results in a loss of multimodal behavior and degrades editing quality. To overcome this issue, we propose TrojanEdit, a backdoor injection framework that dynamically adjusts the gradient contributions of each modality during training. This allows the model to learn a truly multimodal backdoor that activates only when both triggers are present. Extensive experiments on multiple image editing models show that TrojanEdit successfully integrates triggers from different modalities, achieving balanced multimodal backdoor learning while preserving clean editing performance and ensuring high attack effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Study backdoor attacks on multimodal image editing models
Address modality bias in combined textual and visual triggers
Develop TrojanEdit for balanced multimodal backdoor learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic gradient adjustment for multimodal triggers
Balanced multimodal backdoor learning
Preserves clean editing performance
πŸ”Ž Similar Papers
J
Ji Guo
Laboratory Of Intelligent Collaborative Computing, University of Electronic Science and Technology of China, China
P
Peihong Chen
School of Automation Engineering, University of Electronic Science and Technology of China, China
Wenbo Jiang
Wenbo Jiang
University of Electronic Science and Technology of China
AI securityBackdoor attack
G
Guoming Lu
Laboratory Of Intelligent Collaborative Computing, University of Electronic Science and Technology of China, China
X
Xiaolei Wen
J
Jiaming He
J
Jiachen Li
Aiguo Chen
Aiguo Chen
University of Electronic Science and Technology of China
θ”ι‚¦ε­¦δΉ οΌŒεΌΊεŒ–ε­¦δΉ 
H
Hongwei Li