🤖 AI Summary
This paper studies the convergence and statistical accuracy of the Expectation-Maximization (EM) algorithm for overparameterized two-component mixture linear regression (2MLR) under model misspecification. Focusing on the setting where both regression coefficients and mixing weights are unknown, the authors combine asymptotic analysis with finite-sample theory to uncover a fundamental role of initialization balance in EM dynamics: with unbalanced initialization, EM achieves an iteration complexity of $O(log(n/d))$ and statistical error $O(sqrt{d/n})$; with balanced initialization, it requires $O(sqrt{n/d})$ iterations but only attains suboptimal accuracy $O((d/n)^{1/4})$. The work establishes, for the first time, an exact trade-off between iteration count and statistical error. Furthermore, the analysis is extended to low signal-to-noise ratio regimes, providing new insights into the robustness of EM in high-dimensional misspecified models.
📝 Abstract
Mixture models have attracted significant attention due to practical effectiveness and comprehensive theoretical foundations. A persisting challenge is model misspecification, which occurs when the model to be fitted has more mixture components than those in the data distribution. In this paper, we develop a theoretical understanding of the Expectation-Maximization (EM) algorithm's behavior in the context of targeted model misspecification for overspecified two-component Mixed Linear Regression (2MLR) with unknown $d$-dimensional regression parameters and mixing weights. In Theorem 5.1 at the population level, with an unbalanced initial guess for mixing weights, we establish linear convergence of regression parameters in $O(log(1/ε))$ steps. Conversely, with a balanced initial guess for mixing weights, we observe sublinear convergence in $O(ε^{-2})$ steps to achieve the $ε$-accuracy at Euclidean distance. In Theorem 6.1 at the finite-sample level, for mixtures with sufficiently unbalanced fixed mixing weights, we demonstrate a statistical accuracy of $O((d/n)^{1/2})$, whereas for those with sufficiently balanced fixed mixing weights, the accuracy is $O((d/n)^{1/4})$ given $n$ data samples. Furthermore, we underscore the connection between our population level and finite-sample level results: by setting the desired final accuracy $ε$ in Theorem 5.1 to match that in Theorem 6.1 at the finite-sample level, namely letting $ε= O((d/n)^{1/2})$ for sufficiently unbalanced fixed mixing weights and $ε= O((d/n)^{1/4})$ for sufficiently balanced fixed mixing weights, we intuitively derive iteration complexity bounds $O(log (1/ε))=O(log (n/d))$ and $O(ε^{-2})=O((n/d)^{1/2})$ at the finite-sample level for sufficiently unbalanced and balanced initial mixing weights. We further extend our analysis in overspecified setting to low SNR regime.