🤖 AI Summary
To address the high computational cost and inefficiency of hyperparameter tuning for logistic regression on high-dimensional data, this paper proposes Prevalidated Ridge Regression (PRR). PRR leverages the closed-form leave-one-out (LOO) error expression inherent in ridge regression to directly derive an optimal rescaling factor that minimizes logistic loss—bypassing conventional cross-validation and grid search. The method fixes a single regularization strength and requires virtually no hyperparameter optimization. Empirically, PRR achieves classification error rates and logistic loss comparable to finely tuned logistic regression, while accelerating training by several-fold. It thus jointly delivers competitive accuracy, robustness, and inference efficiency. To our knowledge, this is the first framework that explicitly links ridge regression’s LOO error to logistic loss and enables end-to-end, hyperparameter-free classification.
📝 Abstract
Logistic regression is a ubiquitous method for probabilistic classification. However, the effectiveness of logistic regression depends upon careful and relatively computationally expensive tuning, especially for the regularisation hyperparameter, and especially in the context of high-dimensional data. We present a prevalidated ridge regression model that closely matches logistic regression in terms of classification error and log-loss, particularly for high-dimensional data, while being significantly more computationally efficient and having effectively no hyperparameters beyond regularisation. We scale the coefficients of the model so as to minimise log-loss for a set of prevalidated predictions derived from the estimated leave-one-out cross-validation error. This exploits quantities already computed in the course of fitting the ridge regression model in order to find the scaling parameter with nominal additional computational expense.