TrojanEdit: Multimodal Backdoor Attack Against Image Editing Model

📅 2024-11-22

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Existing backdoor attacks on multimodal image editing diffusion models suffer from modality bias, degraded multimodal behavior, and reduced editing fidelity due to overreliance on single-modality triggers. Method: This work proposes the first covert backdoor attack paradigm requiring *cooperative activation* by both textual and visual triggers. It introduces a dynamic gradient modulation mechanism integrating multimodal alignment training, dual-modality trigger embedding, and modality-aware gradient reweighting to ensure genuine cross-modal synergy. Contribution/Results: Evaluated on multiple state-of-the-art models, the attack achieves >92% success rate while preserving original editing performance (LPIPS degradation < 0.01). Crucially, it fails completely if either trigger is absent, demonstrating strong stealth and efficacy. This work establishes a new benchmark and technical framework for security assessment of multimodal generative models.

Technology Category

Application Category

📝 Abstract

Multimodal diffusion models for image editing generate outputs conditioned on both textual instructions and visual inputs, aiming to modify target regions while preserving the rest of the image. Although diffusion models have been shown to be vulnerable to backdoor attacks, existing efforts mainly focus on unimodal generative models and fail to address the unique challenges in multimodal image editing. In this paper, we present the first study of backdoor attacks on multimodal diffusion-based image editing models. We investigate the use of both textual and visual triggers to embed a backdoor that achieves high attack success rates while maintaining the model's normal functionality. However, we identify a critical modality bias. Simply combining triggers from different modalities leads the model to primarily rely on the stronger one, often the visual modality, which results in a loss of multimodal behavior and degrades editing quality. To overcome this issue, we propose TrojanEdit, a backdoor injection framework that dynamically adjusts the gradient contributions of each modality during training. This allows the model to learn a truly multimodal backdoor that activates only when both triggers are present. Extensive experiments on multiple image editing models show that TrojanEdit successfully integrates triggers from different modalities, achieving balanced multimodal backdoor learning while preserving clean editing performance and ensuring high attack effectiveness.

Problem

Research questions and friction points this paper is trying to address.

Study backdoor attacks on multimodal image editing models

Address modality bias in combined textual and visual triggers

Develop TrojanEdit for balanced multimodal backdoor learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic gradient adjustment for multimodal triggers

Balanced multimodal backdoor learning

Preserves clean editing performance

🔎 Similar Papers

Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models