🤖 AI Summary
This paper addresses two fundamental challenges in least-squares model averaging: optimal weight learning and exhaustive subset selection. We propose a unified theoretical and algorithmic framework grounded in Mallows’ $C_p$ criterion. First, we derive a non-asymptotic oracle inequality that rigorously characterizes the prediction risk of model averaging estimators. Second, we introduce a dimension-adaptive $C_p$ criterion that provably breaks the existing risk lower bound for exhaustive subset averaging. Third, we uncover an implicit ensemble effect embedded in classical model selection criteria—such as AIC and BIC—revealing their intrinsic connections to model averaging. To enhance computational feasibility, we incorporate a pruning strategy into exhaustive subset enumeration, ensuring asymptotic optimality under mild regularity conditions. Extensive numerical experiments demonstrate that our method achieves significantly higher predictive accuracy and stability compared to conventional model selection and state-of-the-art model averaging approaches.
📝 Abstract
Model averaging (MA) and ensembling play a crucial role in statistical and machine learning practice. When multiple candidate models are considered, MA techniques can be used to weight and combine them, often resulting in improved predictive accuracy and better estimation stability compared to model selection (MS) methods. In this paper, we address two challenges in combining least squares estimators from both theoretical and practical perspectives. We first establish several oracle inequalities for least squares MA via minimizing a Mallows' $C_p$ criterion under an arbitrary candidate model set. Compared to existing studies, these oracle inequalities yield faster excess risk and directly imply the asymptotic optimality of the resulting MA estimators under milder conditions. Moreover, we consider candidate model construction and investigate the problem of optimal all-subset combination for least squares estimators, which is an important yet rarely discussed topic in the existing literature. We show that there exists a fundamental limit to achieving the optimal all-subset MA risk. To attain this limit, we propose a novel Mallows-type MA procedure based on a dimension-adaptive $C_p$ criterion. The implicit ensembling effects of several MS procedures are also revealed and discussed. We conduct several numerical experiments to support our theoretical findings and demonstrate the effectiveness of the proposed Mallows-type MA estimator.