MIRAGE: Benchmarking and Aligning Multi-Instance Image Editing

📅 2026-04-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of over-editing and spatial misalignment in existing image editing methods when handling complex scenes involving multiple similar instances and compound instructions. To this end, the authors propose the MIRAGE framework, which first leverages a vision-language model to parse instructions into corresponding regional subsets. Precise local editing is then achieved through multi-branch parallel denoising combined with latent representation injection. The approach innovatively incorporates a training-free multi-instance region alignment mechanism and employs reference trajectory guidance to preserve background consistency. Evaluated on MIRA-Bench and RefEdit-Bench, MIRAGE significantly outperforms current state-of-the-art methods, enabling accurate multi-instance editing while maintaining high-fidelity backgrounds.
📝 Abstract
Instruction-guided image editing has seen remarkable progress with models like FLUX.2 and Qwen-Image-Edit, yet they still struggle with complex scenarios with multiple similar instances each requiring individual edits. We observe that state-of-the-art models suffer from severe over-editing and spatial misalignment when faced with multiple identical instances and composite instructions. To this end, we introduce a comprehensive benchmark specifically designed to evaluate fine-grained consistency in multi-instance and multi-instruction settings. To address the failures of existing methods observed in our benchmark, we propose Multi-Instance Regional Alignment via Guided Editing (MIRAGE), a training-free framework that enables precise, localized editing. By leveraging a vision-language model to parse complex instructions into regional subsets, MIRAGE employs a multi-branch parallel denoising strategy. This approach injects latent representations of target regions into the global representation space while maintaining background integrity through a reference trajectory. Extensive evaluations on MIRA-Bench and RefEdit-Bench demonstrate that our framework significantly outperforms existing methods in achieving precise instance-level modifications while preserving background consistency. Our benchmark and code are available at https://github.com/ZiqianLiu666/MIRAGE.
Problem

Research questions and friction points this paper is trying to address.

multi-instance image editing
instruction-guided editing
spatial misalignment
over-editing
fine-grained consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-instance editing
instruction-guided image editing
regional alignment
training-free framework
vision-language parsing
🔎 Similar Papers
No similar papers found.