SafeEditor: Unified MLLM for Efficient Post-hoc T2I Safety Editing

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing inference-time safety alignment methods for text-to-image (T2I) models suffer from excessively high false rejection rates and an imbalance between safety and usability. To address this, we propose MR-SafeEdit: a model-agnostic, plug-and-play multi-round safety editing framework. Our core contributions are (1) the construction of a novel multimodal, interleaved text–image dataset specifically designed for iterative safety editing, and (2) the introduction of a posterior, iterative safety intervention paradigm. Leveraging a unified multimodal large language model (MLLM), MR-SafeEdit integrates conversational, multi-turn safety reasoning; fine-grained localization of unsafe content; and semantics-preserving image inpainting. Extensive experiments across multiple mainstream T2I models demonstrate that MR-SafeEdit significantly reduces false rejection rates while improving editing accuracy and generation quality—achieving superior practicality without compromising strong safety guarantees.

Technology Category

Application Category

📝 Abstract

With the rapid advancement of text-to-image (T2I) models, ensuring their safety has become increasingly critical. Existing safety approaches can be categorized into training-time and inference-time methods. While inference-time methods are widely adopted due to their cost-effectiveness, they often suffer from limitations such as over-refusal and imbalance between safety and utility. To address these challenges, we propose a multi-round safety editing framework that functions as a model-agnostic, plug-and-play module, enabling efficient safety alignment for any text-to-image model. Central to this framework is MR-SafeEdit, a multi-round image-text interleaved dataset specifically constructed for safety editing in text-to-image generation. We introduce a post-hoc safety editing paradigm that mirrors the human cognitive process of identifying and refining unsafe content. To instantiate this paradigm, we develop SafeEditor, a unified MLLM capable of multi-round safety editing on generated images. Experimental results show that SafeEditor surpasses prior safety approaches by reducing over-refusal while achieving a more favorable safety-utility balance.

Problem

Research questions and friction points this paper is trying to address.

Reducing over-refusal in text-to-image model safety mechanisms

Balancing safety and utility in post-hoc image editing

Developing model-agnostic safety editing for generated images

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-round safety editing framework for T2I models

Model-agnostic plug-and-play safety alignment module

Unified MLLM performing multi-round safety image editing

🔎 Similar Papers

No similar papers found.

Authors to Follow