๐ค AI Summary
This work addresses the problem of learning $k$-component Gaussian mixture models (GMMs) supported on the union of $k$ constant-radius balls in high dimensions, as well as Gaussian convolutions over low-dimensional manifolds or sets with small covering numbers. We propose the first analytically tractable diffusion-based algorithm for this setting. Methodologically, we unify score function estimation with higher-order Gaussian noise sensitivity analysis, augmented by poly-logarithmic piecewise polynomial regression and rigorous convergence theory. Under a minimal weight separation assumption, our algorithm achieves total variation error $varepsilon$ in quasi-polynomial time and sample complexity $Oig(n^{mathrm{poly}log((n+k)/varepsilon)}ig)$, overcoming fundamental limitations of classical algebraic approaches. Notably, this is the first subexponential learnability guarantee for Gaussian convolutions on manifolds. Moreover, our framework unifies and enables efficient learning of both continuous and discrete GMMsโresolving longstanding challenges in high-dimensional distribution learning.
๐ Abstract
We give a new algorithm for learning mixtures of $k$ Gaussians (with identity covariance in $mathbb{R}^n$) to TV error $varepsilon$, with quasi-polynomial ($O(n^{ ext{poly,log}left(frac{n+k}{varepsilon}
ight)})$) time and sample complexity, under a minimum weight assumption. Our results extend to continuous mixtures of Gaussians where the mixing distribution is supported on a union of $k$ balls of constant radius. In particular, this applies to the case of Gaussian convolutions of distributions on low-dimensional manifolds, or more generally sets with small covering number, for which no sub-exponential algorithm was previously known. Unlike previous approaches, most of which are algebraic in nature, our approach is analytic and relies on the framework of diffusion models. Diffusion models are a modern paradigm for generative modeling, which typically rely on learning the score function (gradient log-pdf) along a process transforming a pure noise distribution, in our case a Gaussian, to the data distribution. Despite their dazzling performance in tasks such as image generation, there are few end-to-end theoretical guarantees that they can efficiently learn nontrivial families of distributions; we give some of the first such guarantees. We proceed by deriving higher-order Gaussian noise sensitivity bounds for the score functions for a Gaussian mixture to show that that they can be inductively learned using piecewise polynomial regression (up to poly-logarithmic degree), and combine this with known convergence results for diffusion models.