Editable Noise Map Inversion: Encoding Target-image into Noise For High-Fidelity Image Manipulation

πŸ“… 2025-09-30
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing noise-map inversion methods for text-guided image editing achieve high-fidelity source reconstruction but suffer from weak text alignment and limited editing flexibility. To address this, we propose an editable noise-map optimization framework: target images are inverted into an intermediate noise space of a diffusion model, and a joint objective simultaneously optimizes reconstruction fidelity and text-guided constraints while minimizing the discrepancy between source and edited noise mapsβ€”thereby significantly improving text alignment accuracy and semantic controllability. Furthermore, we introduce a temporal consistency constraint to extend the method to video editing. Our approach outperforms state-of-the-art methods on image editing benchmarks, preserving source content integrity while enhancing semantic adherence. Notably, it is the first to enable end-to-end, text-driven video editing via noise-map inversion.

Technology Category

Application Category

πŸ“ Abstract
Text-to-image diffusion models have achieved remarkable success in generating high-quality and diverse images. Building on these advancements, diffusion models have also demonstrated exceptional performance in text-guided image editing. A key strategy for effective image editing involves inverting the source image into editable noise maps associated with the target image. However, previous inversion methods face challenges in adhering closely to the target text prompt. The limitation arises because inverted noise maps, while enabling faithful reconstruction of the source image, restrict the flexibility needed for desired edits. To overcome this issue, we propose Editable Noise Map Inversion (ENM Inversion), a novel inversion technique that searches for optimal noise maps to ensure both content preservation and editability. We analyze the properties of noise maps for enhanced editability. Based on this analysis, our method introduces an editable noise refinement that aligns with the desired edits by minimizing the difference between the reconstructed and edited noise maps. Extensive experiments demonstrate that ENM Inversion outperforms existing approaches across a wide range of image editing tasks in both preservation and edit fidelity with target prompts. Our approach can also be easily applied to video editing, enabling temporal consistency and content manipulation across frames.
Problem

Research questions and friction points this paper is trying to address.

Overcoming noise map limitations in text-guided image editing fidelity
Enhancing editability while preserving source image content integrity
Improving target prompt adherence during diffusion-based image manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Encodes target image into editable noise maps
Refines noise for content preservation and editability
Enables high-fidelity image and video manipulation
πŸ”Ž Similar Papers
No similar papers found.
Mingyu Kang
Mingyu Kang
UC Berkeley
quantum physicsquantum computing
Y
Yong Suk Choi
Department of Computer Science, University of Hanyang, Seoul, Korea