๐ค AI Summary
This paper addresses the lack of a unified theoretical framework for variational dimensionality reduction. It proposes a unified Variational Information Bottleneck (VIB) framework that jointly optimizes encoder-based information compression and decoder-based generative fidelity, enabling principled information trade-offs in latent space. Key contributions include: (1) introducing DVSIB and beta-DVCCAโnovel methods that extend the multivariate information bottleneck to deep variational settings for the first time; (2) establishing theoretical connections between DSIB and contrastive learning approaches (e.g., Barlow Twins) via mutual information regularization; and (3) proposing symmetric and weighted mutual information regularization to support multi-view representation learning and generative modeling. Evaluated on Noisy MNIST and CIFAR-100, the framework achieves significant improvements in classification accuracy, latent dimension efficiency, and sample efficiency, attaining state-of-the-art or superior performance.
๐ Abstract
Variational dimensionality reduction methods are widely used for their accuracy, generative capabilities, and robustness. We introduce a unifying framework that generalizes both such as traditional and state-of-the-art methods. The framework is based on an interpretation of the multivariate information bottleneck, trading off the information preserved in an encoder graph (defining what to compress) against that in a decoder graph (defining a generative model for data). Using this approach, we rederive existing methods, including the deep variational information bottleneck, variational autoencoders, and deep multiview information bottleneck. We naturally extend the deep variational CCA (DVCCA) family to beta-DVCCA and introduce a new method, the deep variational symmetric information bottleneck (DVSIB). DSIB, the deterministic limit of DVSIB, connects to modern contrastive learning approaches such as Barlow Twins, among others. We evaluate these methods on Noisy MNIST and Noisy CIFAR-100, showing that algorithms better matched to the structure of the problem like DVSIB and beta-DVCCA produce better latent spaces as measured by classification accuracy, dimensionality of the latent variables, sample efficiency, and consistently outperform other approaches under comparable conditions. Additionally, we benchmark against state-of-the-art models, achieving superior or competitive accuracy. Our results demonstrate that this framework can seamlessly incorporate diverse multi-view representation learning algorithms, providing a foundation for designing novel, problem-specific loss functions.