Contrastive Diffusion Alignment: Learning Structured Latents for Controllable Generation

πŸ“… 2025-10-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Diffusion model latent spaces lack structured organization, hindering interpretable and fine-grained generation control. To address this, we propose ConDAβ€”a novel framework that introduces contrastive learning into diffusion latent spaces for the first time. ConDA explicitly aligns latent representations with underlying system dynamics factors, endowing latent directions with well-defined physical or semantic interpretations. It performs contrastive alignment within the diffusion embedding space and incorporates a nonlinear manifold traversal strategy, enabling high-fidelity interpolation, extrapolation, and controllable generation. Experiments on fluid dynamics modeling and neural calcium imaging demonstrate that ConDA significantly outperforms linear traversal and conditional control baselines, achieving breakthroughs in both interpretability and control precision.

Technology Category

Application Category

πŸ“ Abstract
Diffusion models excel at generation, but their latent spaces are not explicitly organized for interpretable control. We introduce ConDA (Contrastive Diffusion Alignment), a framework that applies contrastive learning within diffusion embeddings to align latent geometry with system dynamics. Motivated by recent advances showing that contrastive objectives can recover more disentangled and structured representations, ConDA organizes diffusion latents such that traversal directions reflect underlying dynamical factors. Within this contrastively structured space, ConDA enables nonlinear trajectory traversal that supports faithful interpolation, extrapolation, and controllable generation. Across benchmarks in fluid dynamics, neural calcium imaging, therapeutic neurostimulation, and facial expression, ConDA produces interpretable latent representations with improved controllability compared to linear traversals and conditioning-based baselines. These results suggest that diffusion latents encode dynamics-relevant structure, but exploiting this structure requires latent organization and traversal along the latent manifold.
Problem

Research questions and friction points this paper is trying to address.

Aligns diffusion latent geometry with system dynamics for control
Enables nonlinear trajectory traversal for faithful interpolation and generation
Organizes latent spaces to reflect underlying dynamical factors interpretably
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive learning aligns diffusion latent geometry
Nonlinear trajectory traversal enables controllable generation
Structured latent space improves interpretability and controllability
πŸ”Ž Similar Papers
No similar papers found.
R
Ruchi Sandilya
Department of Psychiatry, Weill Cornell Medicine, New York, NY , USA
S
Sumaira Perez
Department of Psychiatry, Weill Cornell Medicine, New York, NY , USA
C
Charles Lynch
Department of Psychiatry, Weill Cornell Medicine, New York, NY , USA
L
Lindsay Victoria
Department of Psychiatry, Weill Cornell Medicine, New York, NY , USA
B
Benjamin Zebley
Department of Psychiatry, Weill Cornell Medicine, New York, NY , USA
D
Derrick Matthew Buchanan
Department of Psychiatry, Stanford University, Stanford, CA, USA
M
Mahendra T. Bhati
Department of Psychiatry, Stanford University, Stanford, CA, USA
N
Nolan Williams
Department of Psychiatry, Stanford University, Stanford, CA, USA
T
Timothy J. Spellman
Department of Neuroscience, University of Connecticut School of Medicine, Farmington, CT, USA
F
Faith M. Gunning
Department of Psychiatry, Weill Cornell Medicine, New York, NY , USA
C
Conor Liston
Department of Psychiatry, Weill Cornell Medicine, New York, NY , USA
Logan Grosenick
Logan Grosenick
Assistant Professor (tenure track) at Cornell University
Machine Learning/AIPsychiatryNeuroimagingMultiomics