Sublinear Variational Optimization of Gaussian Mixture Models with Millions to Billions of Parameters

📅 2025-01-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the prohibitive computational cost of training Gaussian Mixture Models (GMMs) on large-scale, high-dimensional data, this paper proposes a variational inference algorithm integrated with Mixture of Factor Analyzers (MFA). The method employs low-rank covariance parameterization and sublinear stochastic sampling, achieving—for the first time—per-iteration time complexity of *O(D)* in dimensionality *D* and *O(1)* in number of components *C*. Crucially, the total number of pairwise distance computations scales sublinearly with both sample size *N* and component count *C*. Empirically, the algorithm successfully trains a GMM with over 10 billion parameters on billion-scale image data, completing in approximately nine hours on a single CPU—more than ten times faster than the current state-of-the-art. This breakthrough significantly enhances the scalability and practical applicability of GMMs for ultra-large-scale problems.

Technology Category

Application Category

📝 Abstract

Gaussian Mixture Models (GMMs) range among the most frequently used machine learning models. However, training large, general GMMs becomes computationally prohibitive for datasets with many data points $N$ of high-dimensionality $D$. For GMMs with arbitrary covariances, we here derive a highly efficient variational approximation, which is integrated with mixtures of factor analyzers (MFAs). For GMMs with $C$ components, our proposed algorithm significantly reduces runtime complexity per iteration from $mathcal{O}(NCD^2)$ to a complexity scaling linearly with $D$ and remaining constant w.r.t. $C$. Numerical validation of this theoretical complexity reduction then shows the following: the distance evaluations required for the entire GMM optimization process scale sublinearly with $NC$. On large-scale benchmarks, this sublinearity results in speed-ups of an order-of-magnitude compared to the state-of-the-art. As a proof of concept, we train GMMs with over 10 billion parameters on about 100 million images, and observe training times of approximately nine hours on a single state-of-the-art CPU.

Problem

Research questions and friction points this paper is trying to address.

Large-scale Data

Gaussian Mixture Models (GMMs)

Resource Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Mixture Models

Factor Analyzers

High-Dimensional Data Processing

🔎 Similar Papers

No similar papers found.

Authors to Follow