Precise Asymptotics of Bagging Regularized M-estimators

📅 2024-09-23
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the characterization and estimation of squared prediction risk for subsampled Bagging (subagging) regularized M-estimators. Under a proportional asymptotic regime, we first uncover the joint asymptotic correlation between estimators and residuals across overlapping subsamples, and derive a tractable system of shrinkage-inducing nonlinear equations that precisely characterizes their exact risk behavior. Leveraging random matrix theory, fixed-point analysis, and trace function convergence, we construct a consistent and unbiased risk estimator. Key theoretical findings include: (i) subsampling induces implicit regularization; (ii) the optimal subsample size naturally lies in the overparameterized regime; and (iii) jointly tuning subsample size, ensemble size, and explicit regularization strength yields substantial gains over single-model training on full data. These results establish a rigorous theoretical foundation and practical methodology for risk-controlled design of ensemble learning systems.

Technology Category

Application Category

📝 Abstract
We characterize the squared prediction risk of ensemble estimators obtained through subagging (subsample bootstrap aggregating) regularized M-estimators and construct a consistent estimator for the risk. Specifically, we consider a heterogeneous collection of $M ge 1$ regularized M-estimators, each trained with (possibly different) subsample sizes, convex differentiable losses, and convex regularizers. We operate under the proportional asymptotics regime, where the sample size $n$, feature size $p$, and subsample sizes $k_m$ for $m in [M]$ all diverge with fixed limiting ratios $n/p$ and $k_m/n$. Key to our analysis is a new result on the joint asymptotic behavior of correlations between the estimator and residual errors on overlapping subsamples, governed through a (provably) contractible nonlinear system of equations. Of independent interest, we also establish convergence of trace functionals related to degrees of freedom in the non-ensemble setting (with $M = 1$) along the way, extending previously known cases for square loss and ridge, lasso regularizers. When specialized to homogeneous ensembles trained with a common loss, regularizer, and subsample size, the risk characterization sheds some light on the implicit regularization effect due to the ensemble and subsample sizes $(M,k)$. For any ensemble size $M$, optimally tuning subsample size yields sample-wise monotonic risk. For the full-ensemble estimator (when $M o infty$), the optimal subsample size $k^star$ tends to be in the overparameterized regime $(k^star le min{n,p})$, when explicit regularization is vanishing. Finally, joint optimization of subsample size, ensemble size, and regularization can significantly outperform regularizer optimization alone on the full data (without any subagging).
Problem

Research questions and friction points this paper is trying to address.

Characterizing prediction risk of subagging regularized M-estimators under proportional asymptotics
Analyzing ensemble regularization effects through subsample size optimization
Establishing joint asymptotic behavior of overlapping subsample correlations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Characterizing prediction risk of bagging regularized M-estimators
Analyzing joint asymptotic behavior via contractive nonlinear equations
Optimizing subsample and ensemble sizes for risk minimization
T
Takuya Koriyama
Booth School of Business, University of Chicago, Chicago, IL 60637, USA
P
Pratik Patil
Department of Statistics and Data Science, University of Texas, Austin, TX 78712, USA
Jin-Hong Du
Jin-Hong Du
Carnegie Mellon University
high-dimensional statisticsoverparameterized learningsingle-cell data analysis
K
Kai Tan
Department of Statistics, Stanford University, Stanford, CA 94305, USA
Pierre C. Bellec
Pierre C. Bellec
Rutgers - Department of Statistics