Bayesian Profile Regression using Variational Inference to Identify Clusters of Multiple Long-Term Conditions Conditioning on Mortality in Population-Scale Data

πŸ“… 2026-02-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the identification of multimorbidity clusters significantly associated with mortality from large-scale electronic health records. To this end, we propose a Bayesian profile regression model conditioned on covariates and mortality outcomes, integrated with a Dirichlet process mixture model to automatically infer the number of clusters. Methodologically, we introduce full-rank stochastic variational inference (SVI) into this framework for the first time, achieving computational efficiency substantially greater than that of traditional NUTS samplers while maintaining comparable accuracy. Applied to real-world data from 1,296,463 individuals, the model identified 33 distinct disease clusters, with clusters such as metastatic cancer and heart failure showing strong associations with elevated mortality risk, thereby demonstrating the method’s validity and scalability.

Technology Category

Application Category

πŸ“ Abstract
Multiple long-term conditions (MLTC) are increasingly observed in clinical practice globally. Clustering methods to group diseases into commonly co-occurring clusters have been of interest for further understanding of how MLTC group together and their associated impact on patient outcomes. However, such approaches require large, often population-scale datasets. Bayesian Profile Regression (BPR) is a statistical model that combines a Dirichlet Process Mixture model with a hierarchical regression model, in order to form clusters of items conditional on covariates and an outcome of interest. We developed a BPR model using full-rank Stochastic Variational Inference (SVI) for application in large-scale data. We assessed it's performance using simulation studies comparing fits using the No-U-turn (NUTS) sampler and full-rank SVI. We then fit a BPR model to find clusters of MLTC in a population-scale data held in the Secure Anonymised Information Linkage (SAIL) databank. We found results from full-rank SVI compared well with results from NUTS in a simulation study, and the improved fitting performance allowed for fitting models in population-scale datasets. There were 1,296,463 individuals in our electronic health record (EHR) cohort. The clustering model was conditioned on age at cohort entry, socioeconomic deprivation and sex with mortality as the outcome. We used the Elixhauser comorbidity index disease definitions, and found there were 33 disease clusters. We found that clusters featuring metastatic cancer and cardiovascular diseases, such as congestive heart failure, were most strongly associated with the probability of mortality. Our findings show that SVI can be a useful and accurate method for fitting Bayesian models, especially when the dataset size would make Monte Carlo methods prohibitively time consuming or impossible.
Problem

Research questions and friction points this paper is trying to address.

Multiple Long-Term Conditions
Disease Clustering
Mortality
Population-Scale Data
Comorbidity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian Profile Regression
Stochastic Variational Inference
Multiple Long-Term Conditions
Population-scale Data
Disease Clustering
πŸ”Ž Similar Papers
No similar papers found.
J
James Rafferty
Health Data Research UK, Swansea University Medical School, Swansea University, Singleton Park, Swansea, SA1 8PP, Wales, UK
Keith R Abrams
Keith R Abrams
Prof, Univ of Warwick, Hon Prof, Univ of York & NIHR Senior Investigator Emeritus
BiostatisticsBayesian MethodsHTAHealth Data Science
M
Munir Pirmohamed
Department of Pharmacology and Therapeutics, University of Liverpool, Liverpool, L3 5TR, England, UK
Mark Davies
Mark Davies
BenevolentAI
BioinformaticsCheminformaticsDatabasesWeb Development
R
Rhiannon K Owen
Health Data Research UK, Swansea University Medical School, Swansea University, Singleton Park, Swansea, SA1 8PP, Wales, UK