🤖 AI Summary
This study investigates the sensitivity of infectious disease transmission network inference to heterogeneous observational data types—e.g., case counts, spatiotemporal locations, and viral genomic sequences. We propose a unified multi-source data integration framework that jointly leverages a generative model and probabilistic transmission tree estimation. A key methodological innovation is a Markov chain Monte Carlo (MCMC)-based transmission tree sampling algorithm, enabling robust statistical inference of latent variables—including unobserved hosts, infection timing, and tree depth. The framework integrates time-series modeling, genetic distance calibration, and dynamic network generation mechanisms. We theoretically validate its identifiability and consistency on analytically tractable models and apply it to real-world Australian SARS-CoV-2 genomic and epidemiological data. Quantitative analysis reveals distinct contributions of each data type to transmission path reconstruction and network topology inference, providing both methodological foundations and empirical evidence for early-phase, data-driven outbreak modeling.
📝 Abstract
We investigate how the properties of epidemic networks change depending on the availability of different types of data on a disease outbreak. This is achieved by introducing mathematical and computational methods that estimate the probability of transmission trees by combining generative models that jointly determine the number of infected hosts, the probability of infection between them depending on location and genetic information, and their time of infection and sampling. We introduce a suitable Markov Chain Monte Carlo method that we show to sample trees according to their probability. Statistics performed over the sampled trees lead to probabilistic estimations of network properties and other quantities of interest, such as the number of unobserved hosts and the depth of the infection tree. We confirm the validity of our approach by comparing the numerical results with analytically solvable examples. Finally, we apply our methodology to data from COVID-19 in Australia. We find that network properties that are important for the management of the outbreak depend sensitively on the type of data used in the inference.