🤖 AI Summary
Diffusion inversion requires recovering the initial noise corresponding to a given image to reconstruct the original, but existing fixed-point iteration methods suffer from high computational cost and sensitivity to hyperparameters. This paper proposes a non-iterative, explicit fixed-point estimation method. We first derive, for the first time, a closed-form solution for the ideal inverse step’s fixed point via theoretical analysis. Building on this, we design a computationally tractable approximation mechanism grounded in the error from the preceding step, ensuring both unbiasedness and low variance without iterative optimization. By integrating denoising gradient analysis with error approximation, our method achieves high-fidelity inverse process estimation using only a single forward evaluation. Evaluated on NOCAPS and MS-COCO, it surpasses DDIM and various iterative fixed-point approaches—without additional training or iterations—delivering superior and more stable image reconstruction performance.
📝 Abstract
Diffusion inversion aims to recover the initial noise corresponding to a given image such that this noise can reconstruct the original image through the denoising diffusion process. The key component of diffusion inversion is to minimize errors at each inversion step, thereby mitigating cumulative inaccuracies. Recently, fixed-point iteration has emerged as a widely adopted approach to minimize reconstruction errors at each inversion step. However, it suffers from high computational costs due to its iterative nature and the complexity of hyperparameter selection. To address these issues, we propose an iteration-free fixed-point estimator for diffusion inversion. First, we derive an explicit expression of the fixed point from an ideal inversion step. Unfortunately, it inherently contains an unknown data prediction error. Building upon this, we introduce the error approximation, which uses the calculable error from the previous inversion step to approximate the unknown error at the current inversion step. This yields a calculable, approximate expression for the fixed point, which is an unbiased estimator characterized by low variance, as shown by our theoretical analysis. We evaluate reconstruction performance on two text-image datasets, NOCAPS and MS-COCO. Compared to DDIM inversion and other inversion methods based on the fixed-point iteration, our method achieves consistent and superior performance in reconstruction tasks without additional iterations or training.