🤖 AI Summary
To address unobserved confounding and measurement error in intermediate covariates within right-censored survival data, this paper introduces the first extension of two-stage least squares (TSLS) instrumental variable estimation to the semiparametric accelerated failure time (AFT) model. Methodologically, it integrates Leurgans’ synthetic variable approach with iterative weighted generalized estimating equations (GEE) to construct an identifiable framework for causal effect estimation, and rigorously establishes asymptotic normality of the estimator along with a consistent variance estimator. Theoretically, this work fills a critical gap by extending TSLS to semiparametric survival models, delivering an inference procedure that balances statistical efficiency and computational feasibility. Simulation studies and empirical analysis demonstrate that the proposed method substantially outperforms existing nonparametric and Bayesian alternatives—achieving high estimation accuracy while accelerating computation by 300–1500× on large-scale datasets.
📝 Abstract
Instrumental variable (IV) analysis is widely used in fields such as economics and epidemiology to address unobserved confounding and measurement error when estimating the causal effects of intermediate covariates on outcomes. However, extending the commonly used two-stage least squares (TSLS) approach to survival settings is nontrivial due to censoring. This paper introduces a novel extension of TSLS to the semiparametric accelerated failure time (AFT) model with right-censored data, supported by rigorous theoretical justification. Specifically, we propose an iterative reweighted generalized estimating equation (GEE) approach that incorporates Leurgans' synthetic variable method, establish the asymptotic properties of the resulting estimator, and derive a consistent variance estimator, enabling valid causal inference. Simulation studies are conducted to evaluate the finite-sample performance of the proposed method across different scenarios. The results show that it outperforms the naive unweighted GEE method, a parametric IV approach, and a one-stage estimator without IV. The proposed method is also highly scalable to large datasets, achieving a 300- to 1500-fold speedup relative to a Bayesian parametric IV approach in both simulations and the real-data example. We further illustrate the utility of the proposed method through a real-data application using the UK Biobank data.