mmid: Multi-Modal Integration and Downstream analyses for healthcare analytics in Python

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of multimodal data fusion, pervasive missing values, and the lack of unified analytical tools in healthcare research by introducing a modular Python package that, for the first time, integrates multimodal decomposition, imputation, classification, survival prediction, and clustering within a single, end-to-end reproducible framework. The proposed method enables flexible configuration to handle partially observed data and demonstrates robust performance even under modality dropout. Evaluated on cardiovascular risk prediction using UK Biobank data, it effectively fuses cardiac MRI, electrocardiogram signals, and polygenic risk scores, significantly outperforming unimodal baselines while maintaining stability when certain modalities are missing.
📝 Abstract
mmid (Multi-Modal Integration and Downstream analyses for healthcare analytics) is a Python package that offers multi-modal fusion and imputation, classification, time-to-event prediction and clustering functionalities under a single interface, filling the gap of sequential data integration and downstream analyses for healthcare applications in a structured and flexible environment. mmid wraps in a unique package several algorithms for multi-modal decomposition, prediction and clustering, which can be combined smoothly with a single command and proper configuration files, thus facilitating reproducibility and transferability of studies involving heterogeneous health data sources. A showcase on personalised cardiovascular risk prediction is used to highlight the relevance of a composite pipeline enabling proper treatment and analysis of complex multi-modal data. We thus employed mmid in an example real application scenario involving cardiac magnetic resonance imaging, electrocardiogram, and polygenic risk scores data from the UK Biobank. We proved that the three modalities captured joint and individual information that was used to (1) early identify cardiovascular disease before clinical manifestations with cardiological relevance, and (2) do it better than single data sources alone. Moreover, mmid allowed to impute partially observable data modalities without considerable performance losses in downstream disease prediction, thus proving its relevance for real-world health analytics applications (which are often characterised by the presence of missing data).
Problem

Research questions and friction points this paper is trying to address.

multi-modal integration
healthcare analytics
missing data
heterogeneous data
cardiovascular risk prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-modal integration
healthcare analytics
missing data imputation
time-to-event prediction
reproducible pipeline
🔎 Similar Papers
No similar papers found.
A
Andrea Mario Vergani
Human Technopole, Viale Rita Levi-Montalcini 1, 20157, Milan, Italy; Department of Electronics, Information and Bioengineering (DEIB), Politecnico di Milano, Via Ponzio 34/5, 20133 Milan, Italy; MOX lab, Department of Mathematics, Politecnico di Milano, Via Bonardi 9, 20133, Milan, Italy
V
Valeria Iapaolo
MOX lab, Department of Mathematics, Politecnico di Milano, Via Bonardi 9, 20133, Milan, Italy
E
Emanuele Di Angelantonio
Human Technopole, Viale Rita Levi-Montalcini 1, 20157, Milan, Italy; British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Papworth Road, Cambridge Biomedical Campus, Cambridge CB2 0BB, UK; Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Papworth Road, Cambridge Biomedical Campus, Cambridge CB2 0BB, UK; NIHR Blood and Transplant Research Unit in Donor Health and Behaviour, University of Cambridge,
Marco Masseroli
Marco Masseroli
Politecnico di Milano
BioinformaticsMachine LearningData Bases
Francesca Ieva
Francesca Ieva
Associate Professor, MOX - Department of Mathematics, Politecnico di Milano
Health Data ScienceHealth AnalyticsBiostatisticsStatistical Learning