🤖 AI Summary
Variable selection for high-dimensional covariates in clustered failure-time data under stratified sampling remains challenging—existing methods fail to simultaneously correct for sampling bias, model within-cluster dependence, and ensure robust inference when the proportional hazards (PH) assumption is violated.
Method: We propose the first regularized Buckley–James estimation framework for semiparametric accelerated failure time (AFT) models under complex survey designs. Our approach innovatively integrates generalized estimating equations (GEE) with L₁ penalization, employing iterative weighted least squares to stably incorporate sampling weights and account for intra-cluster correlation.
Contribution/Results: We establish its oracle property—achieving both consistent variable selection and asymptotically efficient parameter estimation. Simulations and a dental clinical study demonstrate that our method significantly outperforms existing approaches ignoring either sampling design or clustering structure, in terms of selection accuracy, estimation efficiency, and robustness in small samples.
📝 Abstract
In large-scale epidemiological studies, statistical inference is often complicated by high-dimensional covariates under stratified sampling designs for failure times. Variable selection methods developed for full cohort data do not extend naturally to stratified sampling designs, and appropriate adjustments for the sampling scheme are necessary. Further challenges arise when the failure times are clustered and exhibit within-cluster dependence. As an alternative of Cox proportional hazards (PH) model when the PH assumption is not valid, the penalized Buckley-James (BJ) estimating method for accelerated failure time (AFT) models can potentially handle within-cluster correlation in such setting by incorporating generalized estimating equation (GEE) techniques, though its practical implementation remains hindered by computational instability. We propose a regularized estimating method within the GEE framework for stratified sampling designs, in the spirit of the penalized BJ method but with a reliable inference procedure. We establish the consistency and asymptotic normality of the proposed estimators and show that they achieve the oracle property. Extensive simulation studies demonstrate that our method outperforms existing methods that ignore sampling bias or within-cluster dependence. Moreover, the regularization scheme effectively selects relevant variables even with moderate sample sizes. The proposed methodology is illustrated through applications to a dental study.