The VampPrior Mixture Model

📅 2024-02-06

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Conventional deep latent variable models (e.g., VAEs) employ simplistic priors—such as standard normal distributions—limiting their clustering capability; while Gaussian mixture model (GMM) priors improve performance, they require pre-specifying the number of clusters and are sensitive to initialization. Method: For single-cell RNA-seq data analysis, we propose the Variational Mixture Model (VMM) prior—the first integration of VampPrior with a Dirichlet process Gaussian mixture model—enabling automatic inference of the number of clusters. We further design an alternating optimization framework combining variational inference and empirical Bayes to decouple learning of latent variables from prior parameters. Contribution/Results: The method requires no pre-specified cluster count and exhibits strong robustness. It achieves state-of-the-art clustering performance across multiple benchmark datasets. When integrated into scVI, it significantly enhances cross-batch integration and yields biologically interpretable cell clusters.

Technology Category

Application Category

📝 Abstract

Current clustering priors for deep latent variable models (DLVMs) require defining the number of clusters a-priori and are susceptible to poor initializations. Addressing these deficiencies could greatly benefit deep learning-based scRNA-seq analysis by performing integration and clustering simultaneously. We adapt the VampPrior (Tomczak&Welling, 2018) into a Dirichlet process Gaussian mixture model, resulting in the VampPrior Mixture Model (VMM), a novel prior for DLVMs. We propose an inference procedure that alternates between variational inference and Empirical Bayes to cleanly distinguish variational and prior parameters. Using the VMM in a Variational Autoencoder attains highly competitive clustering performance on benchmark datasets. Augmenting scVI (Lopez et al., 2018), a popular scRNA-seq integration method, with the VMM significantly improves its performance and automatically arranges cells into biologically meaningful clusters.

Problem

Research questions and friction points this paper is trying to address.

Simplistic priors in deep latent variable models limit clustering performance.

Existing Gaussian mixture models require predefined cluster numbers and suffer from poor initializations.

VampPrior Mixture Model improves clustering and biological data integration in scRNA-seq.

Innovation

Methods, ideas, or system contributions that make the work stand out.

VampPrior Mixture Model for DLVMs

Bayesian GMM prior for clustering

Integration with scRNA-seq method scVI

🔎 Similar Papers

Towards One Model for Classical Dimensionality Reduction: A Probabilistic Perspective on UMAP and t-SNE