🤖 AI Summary
Sparse Group Lasso (SGL) and its adaptive variant are widely used in high-dimensional genetic data analysis, yet their complex shrinkage structure and numerous hyperparameters entail substantial computational overhead and challenging hyperparameter tuning. To address this, we propose Dual-layer Feature Reduction (DFR), the first unified, lossless strong screening framework for SGL and adaptive SGL, grounded in dual-norm theory. DFR integrates dual optimization with theoretically guaranteed strong screening rules, supporting both sparse group structures and adaptive penalty designs. Extensive experiments on diverse synthetic and real-world datasets demonstrate that DFR achieves zero false negatives and zero accuracy loss while reducing computation time by factors of several to over an order of magnitude. This yields significant improvements in scalability and practicality for large-scale SGL modeling.
📝 Abstract
The sparse-group lasso performs both variable and group selection, making simultaneous use of the strengths of the lasso and group lasso. It has found widespread use in genetics, a field that regularly involves the analysis of high-dimensional data, due to its sparse-group penalty, which allows it to utilize grouping information. However, the sparse-group lasso can be computationally more expensive than both the lasso and group lasso, due to the added shrinkage complexity, and its additional hyper-parameter that needs tuning. In this paper a novel dual feature reduction method, Dual Feature Reduction (DFR), is presented that uses strong screening rules for the sparse-group lasso and the adaptive sparse-group lasso to reduce their input space before optimization. DFR applies two layers of screening and is based on the dual norms of the sparse-group lasso and adaptive sparse-group lasso. Through synthetic and real numerical studies, it is shown that the proposed feature reduction approach is able to drastically reduce the computational cost in many different scenarios.