SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

πŸ“… 2024-12-05
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the slow inference and deployment challenges of multi-step diffusion models in text-guided image editing, this paper proposes the first single-step inversion framework, integrated with a mask-guided attention rescaling mechanism, enabling millisecond-level end-to-end editing (0.23 seconds). Unlike conventional multi-step inversion and sampling, our method performs latent-space inversion in a single forward pass and dynamically rescales cross-attention weights using an edit-region mask to ensure both local edit fidelity and global consistency. Compared to state-of-the-art multi-step approaches, our method achieves over 50Γ— speedup while maintaining competitive visual quality. By drastically reducing computational overhead, it enables real-time interactive editing and on-device deployment for the first timeβ€”paving a new pathway toward practical adoption of diffusion models in resource-constrained scenarios.

Technology Category

Application Category

πŸ“ Abstract
Recent advances in text-guided image editing enable users to perform image edits through simple text inputs, leveraging the extensive priors of multi-step diffusion-based text-to-image models. However, these methods often fall short of the speed demands required for real-world and on-device applications due to the costly multi-step inversion and sampling process involved. In response to this, we introduce SwiftEdit, a simple yet highly efficient editing tool that achieve instant text-guided image editing (in 0.23s). The advancement of SwiftEdit lies in its two novel contributions: a one-step inversion framework that enables one-step image reconstruction via inversion and a mask-guided editing technique with our proposed attention rescaling mechanism to perform localized image editing. Extensive experiments are provided to demonstrate the effectiveness and efficiency of SwiftEdit. In particular, SwiftEdit enables instant text-guided image editing, which is extremely faster than previous multi-step methods (at least 50 times faster) while maintain a competitive performance in editing results. Our project page is at: https://swift-edit.github.io/
Problem

Research questions and friction points this paper is trying to address.

Achieving fast text-guided image editing
Reducing multi-step inversion and sampling costs
Maintaining competitive editing performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-step inversion framework for reconstruction
Mask-guided editing with attention rescaling
Enables instant text-guided image editing
πŸ”Ž Similar Papers
No similar papers found.
Trong-Tung Nguyen
Trong-Tung Nguyen
AI Research Resident, Qualcomm AI Research
Computer VisionDeep Generative Modeling
Q
Q. Nguyen
VinAI Research
K
Khoi Nguyen
VinAI Research
A
A. Tran
VinAI Research
C
Cuong Pham
VinAI Research, Posts & Telecom. Inst. of Tech., Vietnam