A Bayesian approach for fitting semi-Markov mixture models of cancer latency to individual-level data

πŸ“… 2024-08-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the non-Markovian nature of individual-level screening and diagnosis timing data in cancer natural history modeling. Methodologically, we propose a Bayesian inference framework based on a semi-Markov mixture model, incorporating a multi-state latent progression model, a data-augmented MCMC algorithm for efficient posterior sampling, and leave-one-out (LOO) cross-validation for model selection and uncertainty quantification. A key contribution is the first Bayesian estimation of screening-related overdiagnosis rates. We validate algorithmic convergence and statistical power on synthetic data and apply the framework to real-world data from the Breast Cancer Surveillance Consortium (BCSC), quantifying the overdiagnosis proportion associated with mammographic screening. The implementation is publicly available as the R package *baclava*, providing a generalizable statistical methodology for evaluating cancer screening benefits.

Technology Category

Application Category

πŸ“ Abstract
Multi-state models of cancer natural history are widely used for designing and evaluating cancer early detection strategies. Calibrating such models against longitudinal data from screened cohorts is challenging, especially when fitting non-Markovian mixture models against individual-level data. Here, we consider a family of semi-Markov mixture models of cancer natural history introduce an efficient data-augmented Markov chain Monte Carlo sampling algorithm for fitting these models to individual-level screening and cancer diagnosis histories. Our fully Bayesian approach supports rigorous uncertainty quantification and model selection through leave-one-out cross-validation, and it enables the estimation of screening-related overdiagnosis rates. We demonstrate the effectiveness of our approach using synthetic data, showing that the sampling algorithm efficiently explores the joint posterior distribution of model parameters and latent variables. Finally, we apply our method to data from the US Breast Cancer Surveillance Consortium and estimate the extent of breast cancer overdiagnosis associated with mammography screening. The sampler and model comparison method are available in the R package baclava.
Problem

Research questions and friction points this paper is trying to address.

Fitting semi-Markov mixture models to cancer latency data
Estimating screening-related overdiagnosis rates rigorously
Calibrating models against individual-level screening histories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian semi-Markov mixture models for cancer latency
Data-augmented MCMC sampling for individual-level data
Bayesian uncertainty quantification with cross-validation
πŸ”Ž Similar Papers
No similar papers found.
R
RaphaΓ«l N. Morsomme
Department of Statistical Science, Duke University, North Carolina, United States
S
Shannon T. Holloway
Department of Population Health Sciences, Duke University, North Carolina, United States
Marc D. Ryser
Marc D. Ryser
Duke University, Departments Population Health Science and Mathematics
Cancermathematical modelingcomputational biology
J
Jason Xu
Department of Statistical Science, Duke University, North Carolina, United States