Contrastive Diffusion Alignment: Learning Structured Latents for Controllable Generation

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Diffusion model latent spaces lack structured organization, hindering interpretable and fine-grained generation control. To address this, we propose ConDA—a novel framework that introduces contrastive learning into diffusion latent spaces for the first time. ConDA explicitly aligns latent representations with underlying system dynamics factors, endowing latent directions with well-defined physical or semantic interpretations. It performs contrastive alignment within the diffusion embedding space and incorporates a nonlinear manifold traversal strategy, enabling high-fidelity interpolation, extrapolation, and controllable generation. Experiments on fluid dynamics modeling and neural calcium imaging demonstrate that ConDA significantly outperforms linear traversal and conditional control baselines, achieving breakthroughs in both interpretability and control precision.

Technology Category

Application Category

📝 Abstract

Diffusion models excel at generation, but their latent spaces are not explicitly organized for interpretable control. We introduce ConDA (Contrastive Diffusion Alignment), a framework that applies contrastive learning within diffusion embeddings to align latent geometry with system dynamics. Motivated by recent advances showing that contrastive objectives can recover more disentangled and structured representations, ConDA organizes diffusion latents such that traversal directions reflect underlying dynamical factors. Within this contrastively structured space, ConDA enables nonlinear trajectory traversal that supports faithful interpolation, extrapolation, and controllable generation. Across benchmarks in fluid dynamics, neural calcium imaging, therapeutic neurostimulation, and facial expression, ConDA produces interpretable latent representations with improved controllability compared to linear traversals and conditioning-based baselines. These results suggest that diffusion latents encode dynamics-relevant structure, but exploiting this structure requires latent organization and traversal along the latent manifold.

Problem

Research questions and friction points this paper is trying to address.

Aligns diffusion latent geometry with system dynamics for control

Enables nonlinear trajectory traversal for faithful interpolation and generation

Organizes latent spaces to reflect underlying dynamical factors interpretably

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive learning aligns diffusion latent geometry

Nonlinear trajectory traversal enables controllable generation

Structured latent space improves interpretability and controllability

🔎 Similar Papers

No similar papers found.

Authors to Follow