🤖 AI Summary
In regression modeling, balancing nonlinear relationship capture with interpretability remains challenging. This paper proposes an AIC/BIC-driven adaptive stepwise modeling framework: during stepwise regression, numerical variables are dynamically binarized into interpretable dummy variables via shallow decision trees—*only* when the model’s information criterion (AIC or BIC) improves significantly. This approach uniquely embeds formal model selection criteria directly into the stepwise procedure, enabling principled, data-adaptive nonlinearity detection while preserving the transparency and additivity of linear models. Experiments on synthetic benchmarks and diverse real-world datasets demonstrate that the resulting models are more parsimonious, achieve higher predictive accuracy, and exhibit stronger generalization than conventional stepwise regression and penalized methods (e.g., Lasso, Ridge).
📝 Abstract
Capturing nonlinear relationships without sacrificing interpretability remains a persistent challenge in regression modeling. We introduce SplitWise, a novel framework that enhances stepwise regression. It adaptively transforms numeric predictors into threshold-based binary features using shallow decision trees, but only when such transformations improve model fit, as assessed by the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC). This approach preserves the transparency of linear models while flexibly capturing nonlinear effects. Implemented as a user-friendly R package, SplitWise is evaluated on both synthetic and real-world datasets. The results show that it consistently produces more parsimonious and generalizable models than traditional stepwise and penalized regression techniques.