Pathways on the Image Manifold: Image Editing via Video Generation

📅 2024-11-25

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Existing image diffusion models exhibit low accuracy in complex text-guided editing and often degrade critical content of the original image. To address this, we reformulate static image editing as a temporal evolution process and, for the first time, leverage a pre-trained image-to-video diffusion model to synthesize a manifold-continuous transition path—from source to edited image—along an implicit temporal trajectory. This temporal consistency enforces spatial semantic coherence without requiring fine-tuning or additional training. Our approach integrates manifold-constrained optimization and implicit temporal modeling to jointly preserve editing fidelity and structural/identity integrity of the input. Evaluated on text-driven image editing, our method achieves state-of-the-art performance, outperforming leading approaches both quantitatively (e.g., higher CLIP-Score, lower LPIPS) and qualitatively (e.g., sharper details, better semantic alignment, and stronger identity preservation).

Technology Category

Application Category

📝 Abstract

Recent advances in image editing, driven by image diffusion models, have shown remarkable progress. However, significant challenges remain, as these models often struggle to follow complex edit instructions accurately and frequently compromise fidelity by altering key elements of the original image. Simultaneously, video generation has made remarkable strides, with models that effectively function as consistent and continuous world simulators. In this paper, we propose merging these two fields by utilizing image-to-video models for image editing. We reformulate image editing as a temporal process, using pretrained video models to create smooth transitions from the original image to the desired edit. This approach traverses the image manifold continuously, ensuring consistent edits while preserving the original image's key aspects. Our approach achieves state-of-the-art results on text-based image editing, demonstrating significant improvements in both edit accuracy and image preservation. Visit our project page: https://rotsteinnoam.github.io/Frame2Frame.

Problem

Research questions and friction points this paper is trying to address.

Improving accuracy in complex image edits

Preserving key elements of original images

Using video models for consistent image editing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes image-to-video models for editing

Reformulates editing as a temporal process

Ensures consistent edits preserving key aspects

🔎 Similar Papers

Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos