Diffusion Model-Based Image Editing: A Survey

📅 2024-02-27
🏛️ IEEE Transactions on Pattern Analysis and Machine Intelligence
📈 Citations: 57
Influential: 0
📄 PDF
🤖 AI Summary
This work presents a systematic survey of denoising diffusion-based image editing, focusing on inpainting and outpainting, with particular emphasis on text-guided editing. To address the lack of standardized evaluation, we introduce EditEval—the first comprehensive benchmark for text-guided image editing—and propose LMM Score, a novel multimodal evaluation metric leveraging large multimodal models. We further provide the first unified taxonomy and empirical comparison between multimodal conditional editing methods and traditional context-driven approaches. Additionally, we release Awesome-Diffusion-Model-Based-Image-Editing-Methods, an open-source repository curating state-of-the-art techniques. Our study establishes a technical landscape spanning theoretical foundations, methodological frameworks, and evaluation standards. It identifies key limitations—including scalability, controllability, and evaluation consistency—and outlines concrete directions for future research. The work thus bridges critical gaps in both methodology and assessment, advancing the rigor and reproducibility of diffusion-based image editing.

Technology Category

Application Category

📝 Abstract
Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks, facilitating the synthesis of visual content in an unconditional or input-conditional manner. The core idea behind them is learning to reverse the process of gradually adding noise to images, allowing them to generate high-quality samples from a complex distribution. In this survey, we provide an exhaustive overview of existing methods using diffusion models for image editing, covering both theoretical and practical aspects in the field. We delve into a thorough analysis and categorization of these works from multiple perspectives, including learning strategies, user-input conditions, and the array of specific editing tasks that can be accomplished. In addition, we pay special attention to image inpainting and outpainting, and explore both earlier traditional context-driven and current multimodal conditional methods, offering a comprehensive analysis of their methodologies. To further evaluate the performance of text-guided image editing algorithms, we propose a systematic benchmark, EditEval, featuring an innovative metric, LMM Score. Finally, we address current limitations and envision some potential directions for future research. The accompanying repository is released at https://github.com/SiatMMLab/Awesome-Diffusion-Model-Based-Image-Editing-Methods.
Problem

Research questions and friction points this paper is trying to address.

Surveying diffusion models for image editing tasks.
Analyzing learning strategies and user-input conditions.
Proposing EditEval benchmark for text-guided editing evaluation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Denoising diffusion models for image editing
Systematic benchmark with LMM Score
Multimodal conditional methods analysis
🔎 Similar Papers
No similar papers found.
Y
Yi Huang
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China, and also with University of Chinese Academy of Sciences, Beijing, China
J
Jiancheng Huang
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China, and also with University of Chinese Academy of Sciences, Beijing, China
Y
Yifan Liu
Southern University of Science and Technology, Shenzhen, China, and also with Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Mingfu Yan
Mingfu Yan
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
AIGC
Jiaxi Lv
Jiaxi Lv
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China, and also with University of Chinese Academy of Sciences, Beijing, China
Jianzhuang Liu
Jianzhuang Liu
Shenzhen Institutes of Advanced Technology, University of Chinese Academy of Sciences
Computer VisionImage ProcessingAIGCMachine Learning
W
Wei Xiong
Adobe Inc, San Jose, USA
H
He Zhang
Adobe Inc, San Jose, USA
S
Shifeng Chen
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
L
Liangliang Cao
Apple Inc, Cupertino, USA