🤖 AI Summary
This paper addresses the optimization of multi-stage dynamic treatment regimes (DTRs) in longitudinal randomized clinical trials with right-censored survival outcomes, without assuming proportional hazards. Methodologically, it proposes a counterfactual Q-learning framework that integrates an accelerated failure time (AFT) model with Buckley–James augmentation, coupled with iterative weighted least squares and regression trees to estimate stage-specific, unbiased Q-functions within the potential outcomes framework. This design ensures identifiability of the optimal DTR and substantially mitigates bias accumulation across decision stages. Simulation studies and analysis of the ACTG175 HIV clinical trial data demonstrate that the proposed method significantly outperforms Cox-based approaches in policy accuracy, robustness, and adaptability—particularly under non-proportional hazards and complex time-varying covariate settings.
📝 Abstract
We propose a Buckley James (BJ) Boost Q learning framework for estimating optimal dynamic treatment regimes under right censored survival data, tailored for longitudinal randomized clinical trial settings. The method integrates accelerated failure time models with iterative boosting techniques, including componentwise least squares and regression trees, within a counterfactual Q learning framework. By directly modeling conditional survival time, BJ Boost Q learning avoids the restrictive proportional hazards assumption and enables unbiased estimation of stage specific Q functions. Grounded in potential outcomes, this framework ensures identifiability of the optimal treatment regime under standard causal assumptions. Compared to Cox based Q learning, which relies on hazard modeling and may suffer from bias under misspecification, our approach provides robust and flexible estimation. Simulation studies and analysis of the ACTG175 HIV trial demonstrate that BJ Boost Q learning yields higher accuracy in treatment decision making, especially in multistage settings where bias can accumulate.