🤖 AI Summary
In real-world clinical practice, dynamic treatment regimes (DTRs) must balance multidimensional outcomes (e.g., efficacy and adverse effects) while incorporating individual patient preferences; however, existing methods are largely restricted to univariate outcomes or single-time-point decisions and lack rigorous theoretical foundations. To address this, we propose Latent Utility Q-Learning (LUQ-Learning), the first framework enabling personalized utility modeling for DTRs under multivariate, longitudinal outcomes. LUQ-Learning extends Q-learning via latent variables to encode patient-reported preferences into a multi-objective utility function and optimizes treatment policies under robust causal identification assumptions. It provides strong asymptotic theoretical guarantees, including consistency and asymptotic normality of the estimated optimal policy. Extensive simulations based on clinical trials for low back pain and schizophrenia demonstrate that LUQ-Learning significantly outperforms state-of-the-art baselines in both effectiveness and robustness, validating its practical utility in complex, multi-objective clinical decision-making.
📝 Abstract
In real-world healthcare problems, there are often multiple competing outcomes of interest, such as treatment efficacy and side effect severity. However, statistical methods for estimating dynamic treatment regimes (DTRs) usually assume a single outcome of interest, and the few methods that deal with composite outcomes suffer from important limitations. This includes restrictions to a single time point and two outcomes, the inability to incorporate self-reported patient preferences and limited theoretical guarantees. To this end, we propose a new method to address these limitations, which we dub Latent Utility Q-Learning (LUQ-Learning). LUQ-Learning uses a latent model approach to naturally extend Q-learning to the composite outcome setting and adopt the ideal trade-off between outcomes to each patient. Unlike previous approaches, our framework allows for an arbitrary number of time points and outcomes, incorporates stated preferences and achieves strong asymptotic performance with realistic assumptions on the data. We conduct simulation experiments based on an ongoing trial for low back pain as well as a well-known completed trial for schizophrenia. In all experiments, our method achieves highly competitive empirical performance compared to several alternative baselines.