Disentanglement with Factor Quantized Variational Autoencoders

📅 2024-09-23

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Unsupervised disentangled representation learning faces two key challenges: the absence of ground-truth factor annotations and the difficulty in jointly optimizing disentanglement and reconstruction fidelity. To address these, this paper proposes a novel discrete variational autoencoder (VAE) that— for the first time—integrates scalar quantization of latent variables with a globally shared codebook into the VAE framework, while jointly enforcing total correlation (TC) regularization to explicitly constrain statistical independence among latent dimensions. By unifying discrete representation learning with inductive-bias-driven disentanglement optimization, the method achieves state-of-the-art performance on two standard disentanglement metrics—DCI and InfoMEC—outperforming leading unsupervised approaches. Moreover, it significantly improves image reconstruction quality. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Disentangled representation learning aims to represent the underlying generative factors of a dataset in a latent representation independently of one another. In our work, we propose a discrete variational autoencoder (VAE) based model where the ground truth information about the generative factors are not provided to the model. We demonstrate the advantages of learning discrete representations over learning continuous representations in facilitating disentanglement. Furthermore, we propose incorporating an inductive bias into the model to further enhance disentanglement. Precisely, we propose scalar quantization of the latent variables in a latent representation with scalar values from a global codebook, and we add a total correlation term to the optimization as an inductive bias. Our method called FactorQVAE combines optimization based disentanglement approaches with discrete representation learning, and it outperforms the former disentanglement methods in terms of two disentanglement metrics (DCI and InfoMEC) while improving the reconstruction performance. Our code can be found at https://github.com/ituvisionlab/FactorQVAE.

Problem

Research questions and friction points this paper is trying to address.

Independent Component Analysis

Complex Datasets

Data Understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

FactorQVAE

Disentangled Representation Learning

Quantized Variational Autoencoder

🔎 Similar Papers

No similar papers found.

Authors to Follow