🤖 AI Summary
This paper addresses longitudinal survival data with right-censoring and missing-at-random (MAR) covariates encountered in clinical practice. Method: We propose a novel dynamic treatment regime (DTR) optimization framework that integrates Buckley–James imputation with counterfactual Q-learning. Specifically, the Buckley–James estimator is used to impute conditional expectations of censored survival times, and these imputed values are embedded within a counterfactual Q-learning framework to jointly model multi-stage treatment decisions and potential survival outcomes. Contribution/Results: To our knowledge, this is the first method to achieve deep coupling of Buckley–James imputation and Q-learning, balancing statistical robustness, decision interpretability, and personalized DTR estimation. Simulation studies and validation on real clinical trial data demonstrate that the proposed method significantly outperforms existing benchmarks—yielding longer average survival times and higher accuracy in optimal regime identification—thereby enhancing the reliability and clinical applicability of dynamic treatment recommendations.
📝 Abstract
Treatment strategies are critical in healthcare, particularly when outcomes are subject to censoring. This study introduces the Counterfactual Buckley-James Q-Learning framework, which integrates the Buckley-James method with reinforcement learning to address challenges posed by censored survival data. The Buckley-James method imputes censored survival times via conditional expectations based on observed data, offering a robust mechanism for handling incomplete outcomes. By incorporating these imputed values into a counterfactual Q-learning framework, the proposed method enables the estimation and comparison of potential outcomes under different treatment strategies. This facilitates the identification of optimal dynamic treatment regimes that maximize expected survival time. Through extensive simulation studies, the method demonstrates robust performance across various sample sizes and censoring scenarios, including right censoring and missing at random (MAR). Application to real-world clinical trial data further highlights the utility of this approach in informing personalized treatment decisions, providing an interpretable and reliable tool for optimizing survival outcomes in complex clinical settings.