π€ AI Summary
This study addresses the non-Markovian nature of individual-level screening and diagnosis timing data in cancer natural history modeling. Methodologically, we propose a Bayesian inference framework based on a semi-Markov mixture model, incorporating a multi-state latent progression model, a data-augmented MCMC algorithm for efficient posterior sampling, and leave-one-out (LOO) cross-validation for model selection and uncertainty quantification. A key contribution is the first Bayesian estimation of screening-related overdiagnosis rates. We validate algorithmic convergence and statistical power on synthetic data and apply the framework to real-world data from the Breast Cancer Surveillance Consortium (BCSC), quantifying the overdiagnosis proportion associated with mammographic screening. The implementation is publicly available as the R package *baclava*, providing a generalizable statistical methodology for evaluating cancer screening benefits.
π Abstract
Multi-state models of cancer natural history are widely used for designing and evaluating cancer early detection strategies. Calibrating such models against longitudinal data from screened cohorts is challenging, especially when fitting non-Markovian mixture models against individual-level data. Here, we consider a family of semi-Markov mixture models of cancer natural history introduce an efficient data-augmented Markov chain Monte Carlo sampling algorithm for fitting these models to individual-level screening and cancer diagnosis histories. Our fully Bayesian approach supports rigorous uncertainty quantification and model selection through leave-one-out cross-validation, and it enables the estimation of screening-related overdiagnosis rates. We demonstrate the effectiveness of our approach using synthetic data, showing that the sampling algorithm efficiently explores the joint posterior distribution of model parameters and latent variables. Finally, we apply our method to data from the US Breast Cancer Surveillance Consortium and estimate the extent of breast cancer overdiagnosis associated with mammography screening. The sampler and model comparison method are available in the R package baclava.