SafeEditor: Unified MLLM for Efficient Post-hoc T2I Safety Editing

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing inference-time safety alignment methods for text-to-image (T2I) models suffer from excessively high false rejection rates and an imbalance between safety and usability. To address this, we propose MR-SafeEdit: a model-agnostic, plug-and-play multi-round safety editing framework. Our core contributions are (1) the construction of a novel multimodal, interleaved text–image dataset specifically designed for iterative safety editing, and (2) the introduction of a posterior, iterative safety intervention paradigm. Leveraging a unified multimodal large language model (MLLM), MR-SafeEdit integrates conversational, multi-turn safety reasoning; fine-grained localization of unsafe content; and semantics-preserving image inpainting. Extensive experiments across multiple mainstream T2I models demonstrate that MR-SafeEdit significantly reduces false rejection rates while improving editing accuracy and generation quality—achieving superior practicality without compromising strong safety guarantees.

Technology Category

Application Category

📝 Abstract
With the rapid advancement of text-to-image (T2I) models, ensuring their safety has become increasingly critical. Existing safety approaches can be categorized into training-time and inference-time methods. While inference-time methods are widely adopted due to their cost-effectiveness, they often suffer from limitations such as over-refusal and imbalance between safety and utility. To address these challenges, we propose a multi-round safety editing framework that functions as a model-agnostic, plug-and-play module, enabling efficient safety alignment for any text-to-image model. Central to this framework is MR-SafeEdit, a multi-round image-text interleaved dataset specifically constructed for safety editing in text-to-image generation. We introduce a post-hoc safety editing paradigm that mirrors the human cognitive process of identifying and refining unsafe content. To instantiate this paradigm, we develop SafeEditor, a unified MLLM capable of multi-round safety editing on generated images. Experimental results show that SafeEditor surpasses prior safety approaches by reducing over-refusal while achieving a more favorable safety-utility balance.
Problem

Research questions and friction points this paper is trying to address.

Reducing over-refusal in text-to-image model safety mechanisms
Balancing safety and utility in post-hoc image editing
Developing model-agnostic safety editing for generated images
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-round safety editing framework for T2I models
Model-agnostic plug-and-play safety alignment module
Unified MLLM performing multi-round safety image editing
🔎 Similar Papers
No similar papers found.
Ruiyang Zhang
Ruiyang Zhang
University of Macau
Multi-modal LLMUncertainty Learning3D Understanding
J
Jiahao Luo
PKU Alignment Team, Peking University
X
Xiaoru Feng
PKU Alignment Team, Peking University
Q
Qiufan Pang
PKU Alignment Team, Peking University
Y
Yaodong Yang
PKU Alignment Team, Peking University
J
Juntao Dai
PKU Alignment Team, Peking University; LLM Safety Centre, Beijing Academy of Artificial Intelligence