Counterfactual Survival Q Learning for Longitudinal Randomized Trials via Buckley James Boosting

📅 2025-08-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the optimization of multi-stage dynamic treatment regimes (DTRs) in longitudinal randomized clinical trials with right-censored survival outcomes, without assuming proportional hazards. Methodologically, it proposes a counterfactual Q-learning framework that integrates an accelerated failure time (AFT) model with Buckley–James augmentation, coupled with iterative weighted least squares and regression trees to estimate stage-specific, unbiased Q-functions within the potential outcomes framework. This design ensures identifiability of the optimal DTR and substantially mitigates bias accumulation across decision stages. Simulation studies and analysis of the ACTG175 HIV clinical trial data demonstrate that the proposed method significantly outperforms Cox-based approaches in policy accuracy, robustness, and adaptability—particularly under non-proportional hazards and complex time-varying covariate settings.

Technology Category

Application Category

📝 Abstract
We propose a Buckley James (BJ) Boost Q learning framework for estimating optimal dynamic treatment regimes under right censored survival data, tailored for longitudinal randomized clinical trial settings. The method integrates accelerated failure time models with iterative boosting techniques, including componentwise least squares and regression trees, within a counterfactual Q learning framework. By directly modeling conditional survival time, BJ Boost Q learning avoids the restrictive proportional hazards assumption and enables unbiased estimation of stage specific Q functions. Grounded in potential outcomes, this framework ensures identifiability of the optimal treatment regime under standard causal assumptions. Compared to Cox based Q learning, which relies on hazard modeling and may suffer from bias under misspecification, our approach provides robust and flexible estimation. Simulation studies and analysis of the ACTG175 HIV trial demonstrate that BJ Boost Q learning yields higher accuracy in treatment decision making, especially in multistage settings where bias can accumulate.
Problem

Research questions and friction points this paper is trying to address.

Estimates optimal dynamic treatment for survival data
Avoids restrictive proportional hazards assumption
Improves accuracy in multistage treatment decisions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates accelerated failure time models with boosting
Avoids proportional hazards assumption via direct modeling
Uses componentwise least squares and regression trees
🔎 Similar Papers
No similar papers found.