Comparing Variable Selection and Model Averaging Methods for Logistic Regression

📅 2025-11-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates 28 methods for quantifying model uncertainty—specifically variable selection—in logistic regression, across 11 empirical datasets and a preregistered simulation study covering both separable and nonseparable data scenarios. Evaluated approaches include Bayesian model averaging (BMA), g-priors, EB-local priors, and penalized likelihood methods such as LASSO. Results show that g-prior BMA with (g = max(n, p^2)) achieves overall best performance under nonseparation; LASSO demonstrates superior robustness under separation; and EB-local priors exhibit balanced accuracy and adaptability across both settings. This work provides the first large-scale, jointly empirical and simulation-based validation of logistic regression model selection, delivering a reproducible, context-sensitive methodological framework. It establishes the first empirically grounded benchmark for quantifying model uncertainty in binary-outcome regression—a critical gap in the literature.

Technology Category

Application Category

📝 Abstract
Model uncertainty is a central challenge in statistical models for binary outcomes such as logistic regression, arising when it is unclear which predictors should be included in the model. Many methods have been proposed to address this issue for logistic regression, but their relative performance under realistic conditions remains poorly understood. We therefore conducted a preregistered, simulation-based comparison of 28 established methods for variable selection and inference under model uncertainty, using 11 empirical datasets spanning a range of sample sizes and numbers of predictors, in cases both with and without separation. We found that Bayesian model averaging methods based on g-priors, particularly with g = max(n, p^2), show the strongest overall performance when separation is absent. When separation occurs, penalized likelihood approaches, especially the LASSO, provide the most stable results, while Bayesian model averaging with the local empirical Bayes (EB-local) prior is competitive in both situations. These findings offer practical guidance for applied researchers on how to effectively address model uncertainty in logistic regression in modern empirical and machine learning research.
Problem

Research questions and friction points this paper is trying to address.

Comparing variable selection and model averaging for logistic regression
Addressing model uncertainty in binary outcome statistical models
Evaluating method performance under realistic conditions with separation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian model averaging with g-priors
Penalized likelihood approaches like LASSO
Local empirical Bayes prior methods
🔎 Similar Papers
No similar papers found.
N
Nikola Sekulovski
Department of Psychology, University of Amsterdam, The Netherlands
F
František Bartoš
Department of Psychology, University of Amsterdam, The Netherlands
D
Don van den Bergh
Department of Psychology, University of Amsterdam, The Netherlands
G
Giuseppe Arena
Department of Psychology, University of Amsterdam, The Netherlands
H
Henrik R. Godmann
Department of Psychology, University of Amsterdam, The Netherlands
V
Vipasha Goyal
Department of Psychology, University of Amsterdam, The Netherlands
J
Julius M. Pfadt
Department of Psychology, University of Amsterdam, The Netherlands
M
Maarten Marsman
Department of Psychology, University of Amsterdam, The Netherlands
Adrian E. Raftery
Adrian E. Raftery
Professor Emeritus of Statistics and Sociology, University of Washington
Bayesian statisticsCluster analysisDemographyAtmospheric sciencesStatistical demography