On the Variance, Admissibility, and Stability of Empirical Risk Minimization

📅 2023-05-29

🏛️ Neural Information Processing Systems

📈 Citations: 3

✨ Influential: 1

🤖 AI Summary

This paper identifies an inherent suboptimality of Empirical Risk Minimization (ERM) under squared loss: its bias term dominates the estimation error, preventing attainment of the minimax optimal convergence rate; in contrast, the variance term achieves the minimax rate—a fact previously unverified rigorously. To address this, the authors establish, for the first time under random design, a non-asymptotic, sharp upper bound proving the minimax optimality of ERM’s variance term. They unify and extend Chatterjee’s admissibility theorem and the Caponnetto–Rakhlin stability result to realistic random-design settings. Furthermore, they systematically characterize the intrinsic irregularity of the empirical loss landscape for non-Donsker function classes. Integrating bias–variance decomposition, empirical process theory, and probabilistic analysis, the work delivers the most comprehensive and rigorous non-asymptotic risk decomposition for ERM to date, substantially advancing its theoretical foundations.

📝 Abstract

It is well known that Empirical Risk Minimization (ERM) with squared loss may attain minimax suboptimal error rates (Birg'e and Massart, 1993). The key message of this paper is that, under mild assumptions, the suboptimality of ERM must be due to large bias rather than variance. More precisely, in the bias-variance decomposition of the squared error of the ERM, the variance term necessarily enjoys the minimax rate. In the case of fixed design, we provide an elementary proof of this fact using the probabilistic method. Then, we prove this result for various models in the random design setting. In addition, we provide a simple proof of Chatterjee's admissibility theorem (Chatterjee, 2014, Theorem 1.4), which states that ERM cannot be ruled out as an optimal method, in the fixed design setting, and extend this result to the random design setting. We also show that our estimates imply stability of ERM, complementing the main result of Caponnetto and Rakhlin (2006) for non-Donsker classes. Finally, we show that for non-Donsker classes, there are functions close to the ERM, yet far from being almost-minimizers of the empirical loss, highlighting the somewhat irregular nature of the loss landscape.

Problem

Research questions and friction points this paper is trying to address.

ERM's suboptimality stems from large bias, not variance error.

ERM cannot be ruled out as an optimal estimation method.

ERM exhibits irregular loss landscape in non-Donsker regimes.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proves ERM variance bounded by minimax rate

Extends admissibility theorem to random design

Demonstrates ERM stability for non-Donsker classes

🔎 Similar Papers

No similar papers found.

Authors to Follow