🤖 AI Summary
Existing methods struggle to accurately recover the initial noise latent variable from images generated by DDIM, achieving reasonable reconstruction quality but insufficient latent prediction accuracy. This work proposes a hybrid inversion approach that first employs gradient descent for direct inversion and subsequently refines the estimate through fixed-point iteration to more precisely recover the initial latent variable. The study introduces, innovatively, a “self-interpolation test” as a novel evaluation metric to comprehensively assess latent prediction fidelity. Experimental results demonstrate that the proposed method significantly improves both latent prediction accuracy and image reconstruction quality across three benchmark datasets, consistently outperforming existing approaches in self-interpolation test performance.
📝 Abstract
This paper studies the problem of inverting the DDIM image generation process to recover latent variables, particularly the initial noise map, from a generated image. Existing methods often struggle with accuracy in this task. We propose a novel hybrid approach that combines direct inversion via gradient descent for the first step, followed by a fixed-point method for subsequent steps. Empirical evaluations across three datasets demonstrate that our method significantly improves the prediction of initial latent variables while achieving superior reconstruction accuracy. Additionally, we introduce a new evaluation, called the self-interpolation test, which assesses the quality of images generated from interpolated points between the true and predicted latent maps, offering deeper insights into performance. Our results reveal that while existing methods perform reasonably well in reconstruction, they consistently fail to accurately predict the initial latent variables, resulting in poor performance on the self-interpolation test. In contrast, our method outperforms all others across all metrics, providing valuable insights into diffusion models and enhancing their applications in image generation and editing.