π€ AI Summary
Diffusion model latent spaces lack structured organization, hindering interpretable and fine-grained generation control. To address this, we propose ConDAβa novel framework that introduces contrastive learning into diffusion latent spaces for the first time. ConDA explicitly aligns latent representations with underlying system dynamics factors, endowing latent directions with well-defined physical or semantic interpretations. It performs contrastive alignment within the diffusion embedding space and incorporates a nonlinear manifold traversal strategy, enabling high-fidelity interpolation, extrapolation, and controllable generation. Experiments on fluid dynamics modeling and neural calcium imaging demonstrate that ConDA significantly outperforms linear traversal and conditional control baselines, achieving breakthroughs in both interpretability and control precision.
π Abstract
Diffusion models excel at generation, but their latent spaces are not explicitly organized for interpretable control. We introduce ConDA (Contrastive Diffusion Alignment), a framework that applies contrastive learning within diffusion embeddings to align latent geometry with system dynamics. Motivated by recent advances showing that contrastive objectives can recover more disentangled and structured representations, ConDA organizes diffusion latents such that traversal directions reflect underlying dynamical factors. Within this contrastively structured space, ConDA enables nonlinear trajectory traversal that supports faithful interpolation, extrapolation, and controllable generation. Across benchmarks in fluid dynamics, neural calcium imaging, therapeutic neurostimulation, and facial expression, ConDA produces interpretable latent representations with improved controllability compared to linear traversals and conditioning-based baselines. These results suggest that diffusion latents encode dynamics-relevant structure, but exploiting this structure requires latent organization and traversal along the latent manifold.