Unsupervised Decomposition and Recombination with Discriminator-Driven Diffusion Models

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Disentangling reusable semantic factors from complex data and achieving high-quality recomposition under unsupervised conditions remains a key challenge in generative modeling. This work proposes an unsupervised factor decomposition and recombination method based on diffusion models, introducing for the first time a discriminator-guided adversarial mechanism that enhances disentanglement by distinguishing between single-source samples and cross-source recomposed samples. This approach improves both semantic coherence and physical consistency in synthesized outputs. Empirical evaluations demonstrate superior performance across multiple benchmarks: it achieves lower FID scores and higher MIG and MCC metrics on CelebA-HQ, Virtual KITTI, CLEVR, and Falcor3D datasets, and significantly increases state-space exploration coverage in the LIBERO robotic benchmark.

Technology Category

Application Category

📝 Abstract
Decomposing complex data into factorized representations can reveal reusable components and enable synthesizing new samples via component recombination. We investigate this in the context of diffusion-based models that learn factorized latent spaces without factor-level supervision. In images, factors can capture background, illumination, and object attributes; in robotic videos, they can capture reusable motion components. To improve both latent factor discovery and quality of compositional generation, we introduce an adversarial training signal via a discriminator trained to distinguish between single-source samples and those generated by recombining factors across sources. By optimizing the generator to fool this discriminator, we encourage physical and semantic consistency in the resulting recombinations. Our method outperforms implementations of prior baselines on CelebA-HQ, Virtual KITTI, CLEVR, and Falcor3D, achieving lower FID scores and better disentanglement as measured by MIG and MCC. Furthermore, we demonstrate a novel application to robotic video trajectories: by recombining learned action components, we generate diverse sequences that significantly increase state-space coverage for exploration on the LIBERO benchmark.
Problem

Research questions and friction points this paper is trying to address.

unsupervised decomposition
factorized representation
component recombination
disentanglement
compositional generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

discriminator-driven diffusion
unsupervised disentanglement
factor recombination
compositional generation
robotic video synthesis
🔎 Similar Papers
No similar papers found.
A
Archer Wang
Research Laboratory of Electronics, MIT, Cambridge, MA 02139, USA; NSF AI Institute for Artificial Intelligence and Fundamental Interactions, Cambridge, MA 02139, USA; Department of Physics, Massachusetts Institute of Technology, MIT, Cambridge, MA, USA
E
Emile Anand
School of Computer Science, Georgia Institute of Technology, Atlanta, USA
Yilun Du
Yilun Du
Harvard University
Artificial IntelligenceMachine LearningRoboticsComputer Vision
Marin Soljacic
Marin Soljacic
Professor of Physics, MIT
nanophotonicsphotonic crystalsnonlinear opticswireless power transfer