SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

📅 2024-12-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

To address the slow inference and deployment challenges of multi-step diffusion models in text-guided image editing, this paper proposes the first single-step inversion framework, integrated with a mask-guided attention rescaling mechanism, enabling millisecond-level end-to-end editing (0.23 seconds). Unlike conventional multi-step inversion and sampling, our method performs latent-space inversion in a single forward pass and dynamically rescales cross-attention weights using an edit-region mask to ensure both local edit fidelity and global consistency. Compared to state-of-the-art multi-step approaches, our method achieves over 50× speedup while maintaining competitive visual quality. By drastically reducing computational overhead, it enables real-time interactive editing and on-device deployment for the first time—paving a new pathway toward practical adoption of diffusion models in resource-constrained scenarios.

Technology Category

Application Category

📝 Abstract

Recent advances in text-guided image editing enable users to perform image edits through simple text inputs, leveraging the extensive priors of multi-step diffusion-based text-to-image models. However, these methods often fall short of the speed demands required for real-world and on-device applications due to the costly multi-step inversion and sampling process involved. In response to this, we introduce SwiftEdit, a simple yet highly efficient editing tool that achieve instant text-guided image editing (in 0.23s). The advancement of SwiftEdit lies in its two novel contributions: a one-step inversion framework that enables one-step image reconstruction via inversion and a mask-guided editing technique with our proposed attention rescaling mechanism to perform localized image editing. Extensive experiments are provided to demonstrate the effectiveness and efficiency of SwiftEdit. In particular, SwiftEdit enables instant text-guided image editing, which is extremely faster than previous multi-step methods (at least 50 times faster) while maintain a competitive performance in editing results. Our project page is at: https://swift-edit.github.io/

Problem

Research questions and friction points this paper is trying to address.

Achieving fast text-guided image editing

Reducing multi-step inversion and sampling costs

Maintaining competitive editing performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

One-step inversion framework for reconstruction

Mask-guided editing with attention rescaling

Enables instant text-guided image editing

🔎 Similar Papers

No similar papers found.