Prevalidated ridge regression is a highly-efficient drop-in replacement for logistic regression for high-dimensional data

📅 2024-01-28

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address the high computational cost and inefficiency of hyperparameter tuning for logistic regression on high-dimensional data, this paper proposes Prevalidated Ridge Regression (PRR). PRR leverages the closed-form leave-one-out (LOO) error expression inherent in ridge regression to directly derive an optimal rescaling factor that minimizes logistic loss—bypassing conventional cross-validation and grid search. The method fixes a single regularization strength and requires virtually no hyperparameter optimization. Empirically, PRR achieves classification error rates and logistic loss comparable to finely tuned logistic regression, while accelerating training by several-fold. It thus jointly delivers competitive accuracy, robustness, and inference efficiency. To our knowledge, this is the first framework that explicitly links ridge regression’s LOO error to logistic loss and enables end-to-end, hyperparameter-free classification.

Technology Category

Application Category

📝 Abstract

Logistic regression is a ubiquitous method for probabilistic classification. However, the effectiveness of logistic regression depends upon careful and relatively computationally expensive tuning, especially for the regularisation hyperparameter, and especially in the context of high-dimensional data. We present a prevalidated ridge regression model that closely matches logistic regression in terms of classification error and log-loss, particularly for high-dimensional data, while being significantly more computationally efficient and having effectively no hyperparameters beyond regularisation. We scale the coefficients of the model so as to minimise log-loss for a set of prevalidated predictions derived from the estimated leave-one-out cross-validation error. This exploits quantities already computed in the course of fitting the ridge regression model in order to find the scaling parameter with nominal additional computational expense.

Problem

Research questions and friction points this paper is trying to address.

Improves logistic regression efficiency for high-dimensional data

Reduces computational cost of hyperparameter tuning

Minimizes log-loss using prevalidated ridge regression

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prevalidated ridge regression replaces logistic regression

Minimizes log-loss with prevalidated predictions

Exploits existing computations for efficiency

🔎 Similar Papers

Risk and cross validation in ridge regression with correlated samples