π€ AI Summary
To address the poor stability, weak interpretability, and stringent theoretical assumptions (e.g., the incoherence or irrepresentability condition) inherent in the Lasso for sparse regression, this paper proposes UniLassoβa two-stage sparse regression method. Its core innovation lies in the first joint incorporation of sign consistency and magnitude information from univariate regression coefficients into modeling: Stage I retains the signs of univariate estimates, and Stage II optimizes a sparse solution under these sign constraints. This design circumvents reliance on the irrepresentability condition, substantially improving support recovery accuracy and consistency of prediction error. Theoretical analysis establishes statistical consistency under high-dimensional asymptotics, and the framework naturally extends to generalized linear models and the Cox proportional hazards model. Extensive simulations and real-data experiments demonstrate that UniLasso systematically outperforms standard Lasso in both sparsity identification accuracy and model interpretability.
π Abstract
In this paper, we introduce ``UniLasso'' -- a novel statistical method for sparse regression. This two-stage approach preserves the signs of the univariate coefficients and leverages their magnitude. Both of these properties are attractive for stability and interpretation of the model. Through comprehensive simulations and applications to real-world datasets, we demonstrate that UniLasso outperforms Lasso in various settings, particularly in terms of sparsity and model interpretability. We prove asymptotic support recovery and mean-squared error consistency under a set of conditions different from the well-known irrepresentability conditions for the Lasso. Extensions to generalized linear models (GLMs) and Cox regression are also discussed.