Local linear convergence of gradient methods for overparameterized Gaussian mixtures

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the slow local convergence often observed in gradient-based optimization of over-parameterized Gaussian mixture models, which arises from the complex geometry of the loss landscape. The authors propose an alternating optimization strategy that combines short-step gradient descent with long-step Polyak stepsizes. Under a mild assumption on the mixture weights, this approach achieves local linear convergence; even without such an assumption, it provably approaches the optimal solution up to the natural misspecification error inherent in the model. The study demonstrates that sluggish convergence is not an intrinsic limitation of over-parameterization but can be effectively mitigated through thoughtful algorithmic design, offering a novel perspective on optimizing high-dimensional non-convex mixture models.

📝 Abstract

We study the problem of learning Gaussian mixture models under overparameterization. Prior work has shown that while overparameterization is essential for avoiding spurious local optima and enables global recovery of the ground-truth model using the gradient-EM (expectation-maximization) algorithm, it can dramatically slow down the local rate of convergence. Under certain assumptions on the mixture weights, we show that a standard divergence measure minimized by statistical learning procedures possesses a manifold of slow growth on which the well-known Polyak stepsize reduces the loss geometrically, and design a gradient-based method that converges to minimizers at a locally linear rate. Additionally, we show that our method converges to nearly optimal solutions -- up to a natural misspecification threshold -- for mixtures with arbitrary weights. At a high level, the method alternates between several "short" gradient descent steps that approach the manifold and "long" Polyak steps that contract the distance to minimizers. Our results suggest that slow convergence is not an intrinsic challenge of overparameterization, but can be overcome by exploiting the favorable structure of the loss landscape.

Problem

Research questions and friction points this paper is trying to address.

overparameterization

Gaussian mixtures

local convergence

gradient methods

slow convergence

Innovation

Methods, ideas, or system contributions that make the work stand out.

overparameterization

Gaussian mixture models

local linear convergence