๐ค AI Summary
Text-to-image diffusion models face a fundamental trade-off between reconstruction fidelity and editing flexibility in real-image editing, primarily due to misalignment between image content and textual conditioning in existing inversion methods. To address this, we propose Image-Self-Conditioned Inversion (ISCI), the first approach that explicitly conditions the inversion process on the input image itselfโensuring strict alignment between reconstructed content and conditioning signal. Our method builds an image-conditioned inversion framework atop DDIM/PLMS samplers, integrating latent-space optimization with conditional distillation. Evaluated across multiple benchmarks, ISCI achieves a 2.1 dB PSNR improvement and a 37% gain in editing consistency score, significantly enhancing high-fidelity, detail-preserving local edits on complex images.
๐ Abstract
Text-to-image diffusion models offer powerful image editing capabilities. To edit real images, many methods rely on the inversion of the image into Gaussian noise. A common approach to invert an image is to gradually add noise to the image, where the noise is determined by reversing the sampling equation. This process has an inherent tradeoff between reconstruction and editability, limiting the editing of challenging images such as highly-detailed ones. Recognizing the reliance of text-to-image models inversion on a text condition, this work explores the importance of the condition choice. We show that a condition that precisely aligns with the input image significantly improves the inversion quality. Based on our findings, we introduce Tight Inversion, an inversion method that utilizes the most possible precise condition -- the input image itself. This tight condition narrows the distribution of the model's output and enhances both reconstruction and editability. We demonstrate the effectiveness of our approach when combined with existing inversion methods through extensive experiments, evaluating the reconstruction accuracy as well as the integration with various editing methods.