🤖 AI Summary
Existing coalescent-based methods struggle to accurately infer transmission dynamics—such as the number of infected individuals and the effective reproduction number—of latency-harboring infectious diseases from molecular sequence data. To address this, we propose an interpretable two-compartment coalescent model that jointly incorporates a two-stage (exposed–infectious) population genetic structure and a phase-type distribution to characterize the latent period. We develop a data-augmented Bayesian inference framework, employing Markov chain Monte Carlo (MCMC) to jointly sample genealogies and coalescent parameters, thereby strengthening the mapping between coalescent parameters and epidemiological dynamics. Simulation studies confirm accurate parameter estimation. Applied to a reanalysis of the 2014 Liberian Ebola virus disease outbreak, our method successfully reconstructs the temporal trajectories of infection burden and the effective reproduction number. This work establishes a novel paradigm for molecular epidemiological modeling of latency-harboring pathogens.
📝 Abstract
Coalescent models are used to study the transmission dynamics of rapidly evolving pathogens from molecular sequence data obtained from infected individuals. However coalescent parameters, such as effective population size, offer limited interpretability for transmission dynamics. In this work, we derive a coalescent model for exposed-infected population dynamics that allows us to infer the number of infected individuals and the effective reproduction number over time from the sample genealogy. The model can be interpreted as a two-deme model in which coalescence is restricted to individuals from different demes (exposed and infected). We propose a new data-augmentation framework with Phase-type distribution for Bayesian inference of epidemiological parameters. We study the performance of our approach on simulations and apply it to re-analyze the 2014 Ebola outbreak in Liberia.