🤖 AI Summary
This study addresses the high heterogeneity in chronic disease progression trajectories, which poses challenges for traditional mixed-effects models in identifying clinically meaningful subtypes. To overcome this limitation, the authors propose a novel probabilistic model that directly embeds a mixture structure at the latent variable level, enabling end-to-end identification of disease progression subtypes and simultaneous parameter estimation while avoiding post-hoc classification bias. Integrating probabilistic mixture modeling, mixed-effects modeling, Bayesian inference, and longitudinal data analysis, the method substantially improves clustering accuracy—achieving over 90% subtype identification accuracy in simulations and maintaining above 80% even in complex scenarios. Applied to a CADASIL cohort, the approach successfully discerned two distinct patient subgroups: “early-onset rapid progressors” and “late-onset slow progressors,” demonstrating clear clinical relevance.
📝 Abstract
The progression of chronic diseases often follows highly variable trajectories, and the underlying factors remain poorly understood. Standard mixed-effects models typically represent inter-patient differences as random deviations around a common reference, which may obscure meaningful subgroups. We propose a probabilistic mixture extension of a mixed effects model, the Disease Course Mapping model, to identify distinct disease progression subtypes within a population. The mixture structure is introduced at the latent individual parameters, enabling clustering based on both temporal and spatial variability in disease trajectories. We evaluated the model through simulation studies to assess classification performance and parameter recovery. Classification accuracy exceeded 90% in simpler scenarios and remained above 80% in the most complex case, with particularly high recall and precision for fast-progressing clusters. Compared to a post hoc classification approach, the proposed model yielded more accurate parameter estimates, smaller biases, lower root mean squared errors, and reduced uncertainty. It also correctly recovered the true three-cluster structure in 93% of the simulations. Finally, we applied the model to a longitudinal cohort of CADASIL patients, identifying two clinically meaningful clusters, differentiating patients with early versus late onset and fast versus slow progression, with clear spatial patterns across motor and memory scores. Overall, this probabilistic mixture framework offers a robust, interpretable approach for clustering patients based on spatiotemporal disease dynamics.