🤖 AI Summary
This paper identifies a systematic breakdown of Gaussian universality in classification under high-dimensional linear factor mixture models: when data deviate from Gaussian or Gaussian mixture distributions, the performance of empirical risk minimization (ERM) ceases to depend solely on the first two moments (mean and covariance) and becomes critically sensitive to higher-order moments—particularly skewness and kurtosis.
Method: Leveraging random matrix theory, high-dimensional statistics, and asymptotic convex optimization analysis, we develop a unified analytical framework within an exact high-dimensional asymptotic regime (where dimension and sample size scale proportionally).
Contribution/Results: We rigorously characterize the failure mechanism of Gaussian universality, derive necessary and sufficient conditions for its validity, and obtain closed-form asymptotic expressions for the classification error. Our analysis quantifies how higher-order distributional features govern generalization performance, establishing a new theoretical benchmark for non-Gaussian high-dimensional classification and informing principled loss function design.
📝 Abstract
The assumption of Gaussian or Gaussian mixture data has been extensively exploited in a long series of precise performance analyses of machine learning (ML) methods, on large datasets having comparably numerous samples and features. To relax this restrictive assumption, subsequent efforts have been devoted to establish"Gaussian equivalent principles"by studying scenarios of Gaussian universality where the asymptotic performance of ML methods on non-Gaussian data remains unchanged when replaced with Gaussian data having the same mean and covariance. Beyond the realm of Gaussian universality, there are few exact results on how the data distribution affects the learning performance. In this article, we provide a precise high-dimensional characterization of empirical risk minimization, for classification under a general mixture data setting of linear factor models that extends Gaussian mixtures. The Gaussian universality is shown to break down under this setting, in the sense that the asymptotic learning performance depends on the data distribution beyond the class means and covariances. To clarify the limitations of Gaussian universality in the classification of mixture data and to understand the impact of its breakdown, we specify conditions for Gaussian universality and discuss their implications for the choice of loss function.