π€ AI Summary
Traditional VAEs struggle to simultaneously achieve high-fidelity sample generation and meaningful latent clustering structure in generative clustering tasks. This paper proposes TreeDiffusion, a two-stage framework that jointly optimizes hierarchical clustering with conditional diffusion modeling: first, a VAE learns a hierarchical cluster structure and extracts cluster embeddings; second, these embeddings condition a diffusion model to generate high-fidelity, cluster-specific images. TreeDiffusion is the first method to enable end-to-end joint training of hierarchical clustering and diffusion models, thereby preserving clustering interpretability while overcoming the generative quality limitations of VAEs. It supports cluster-level controllable generation and representation visualization. On multiple benchmark datasets, TreeDiffusion reduces FID by 22% and significantly improves intra-cluster consistency; qualitative experiments confirm concurrent enhancements in both generation quality and clustering interpretability.
π Abstract
Finding clusters of data points with similar characteristics and generating new cluster-specific samples can significantly enhance our understanding of complex data distributions. While clustering has been widely explored using Variational Autoencoders, these models often lack generation quality in real-world datasets. This paper addresses this gap by introducing TreeDiffusion, a deep generative model that conditions Diffusion Models on hierarchical clusters to obtain high-quality, cluster-specific generations. The proposed pipeline consists of two steps: a VAE-based clustering model that learns the hierarchical structure of the data, and a conditional diffusion model that generates realistic images for each cluster. We propose this two-stage process to ensure that the generated samples remain representative of their respective clusters and enhance image fidelity to the level of diffusion models. A key strength of our method is its ability to create images for each cluster, providing better visualization of the learned representations by the clustering model, as demonstrated through qualitative results. This method effectively addresses the generative limitations of VAE-based approaches while preserving their clustering performance. Empirically, we demonstrate that conditioning diffusion models on hierarchical clusters significantly enhances generative performance, thereby advancing the state of generative clustering models.