Spectral Collapse in Diffusion Inversion

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical issue in unpaired image translation: when the source domain exhibits sparser spectral content than the target domain, standard deterministic diffusion inversion disrupts the Gaussianity of latent variables, resulting in overly smooth outputs with lost high-frequency textures. The paper identifies and formally names this phenomenon as “spectral collapse.” To mitigate it, the authors propose Orthogonal Variance Guidance (OVG), a novel method that modifies the ODE dynamics to restore the theoretical Gaussian noise magnitude within the null space of structural gradients. This enables simultaneous preservation of semantic structure and faithful recovery of fine-scale details, effectively overcoming the longstanding trade-off between structural fidelity and textural realism. Experiments on benchmarks such as BBBC021 microscopy super-resolution and Edges2Shoes demonstrate significant improvements in generation quality.

Technology Category

Application Category

📝 Abstract
Conditional diffusion inversion provides a powerful framework for unpaired image-to-image translation. However, we demonstrate through an extensive analysis that standard deterministic inversion (e.g. DDIM) fails when the source domain is spectrally sparse compared to the target domain (e.g., super-resolution, sketch-to-image). In these contexts, the recovered latent from the input does not follow the expected isotropic Gaussian distribution. Instead it exhibits a signal with lower frequencies, locking target sampling to oversmoothed and texture-poor generations. We term this phenomenon spectral collapse. We observe that stochastic alternatives attempting to restore the noise variance tend to break the semantic link to the input, leading to structural drift. To resolve this structure-texture trade-off, we propose Orthogonal Variance Guidance (OVG), an inference-time method that corrects the ODE dynamics to enforce the theoretical Gaussian noise magnitude within the null-space of the structural gradient. Extensive experiments on microscopy super-resolution (BBBC021) and sketch-to-image (Edges2Shoes) demonstrate that OVG effectively restores photorealistic textures while preserving structural fidelity.
Problem

Research questions and friction points this paper is trying to address.

spectral collapse
diffusion inversion
image-to-image translation
structure-texture trade-off
latent distribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

spectral collapse
diffusion inversion
Orthogonal Variance Guidance
structure-texture trade-off
deterministic inversion
🔎 Similar Papers
No similar papers found.
N
Nicolas Bourriez
Ecole Normale Supérieure PSL, Paris, France
A
Alexandre Verine
Ecole Normale Supérieure PSL, Paris, France
Auguste Genovesio
Auguste Genovesio
Ecole Normale Supérieure
deep learningcomputational biologyimaging