A Bayesian approach for fitting semi-Markov mixture models of cancer latency to individual-level data

📅 2024-08-26

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This study addresses the non-Markovian nature of individual-level screening and diagnosis timing data in cancer natural history modeling. Methodologically, we propose a Bayesian inference framework based on a semi-Markov mixture model, incorporating a multi-state latent progression model, a data-augmented MCMC algorithm for efficient posterior sampling, and leave-one-out (LOO) cross-validation for model selection and uncertainty quantification. A key contribution is the first Bayesian estimation of screening-related overdiagnosis rates. We validate algorithmic convergence and statistical power on synthetic data and apply the framework to real-world data from the Breast Cancer Surveillance Consortium (BCSC), quantifying the overdiagnosis proportion associated with mammographic screening. The implementation is publicly available as the R package *baclava*, providing a generalizable statistical methodology for evaluating cancer screening benefits.

Technology Category

Application Category

📝 Abstract

Multi-state models of cancer natural history are widely used for designing and evaluating cancer early detection strategies. Calibrating such models against longitudinal data from screened cohorts is challenging, especially when fitting non-Markovian mixture models against individual-level data. Here, we consider a family of semi-Markov mixture models of cancer natural history introduce an efficient data-augmented Markov chain Monte Carlo sampling algorithm for fitting these models to individual-level screening and cancer diagnosis histories. Our fully Bayesian approach supports rigorous uncertainty quantification and model selection through leave-one-out cross-validation, and it enables the estimation of screening-related overdiagnosis rates. We demonstrate the effectiveness of our approach using synthetic data, showing that the sampling algorithm efficiently explores the joint posterior distribution of model parameters and latent variables. Finally, we apply our method to data from the US Breast Cancer Surveillance Consortium and estimate the extent of breast cancer overdiagnosis associated with mammography screening. The sampler and model comparison method are available in the R package baclava.

Problem

Research questions and friction points this paper is trying to address.

Fitting semi-Markov mixture models to cancer latency data

Estimating screening-related overdiagnosis rates rigorously

Calibrating models against individual-level screening histories

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian semi-Markov mixture models for cancer latency

Data-augmented MCMC sampling for individual-level data

Bayesian uncertainty quantification with cross-validation

🔎 Similar Papers

No similar papers found.