mmid: Multi-Modal Integration and Downstream analyses for healthcare analytics in Python

πŸ“… 2026-04-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

216K/year
πŸ€– AI Summary
This work addresses the challenges of multimodal data fusion, pervasive missing values, and the lack of unified analytical tools in healthcare research by introducing a modular Python package that, for the first time, integrates multimodal decomposition, imputation, classification, survival prediction, and clustering within a single, end-to-end reproducible framework. The proposed method enables flexible configuration to handle partially observed data and demonstrates robust performance even under modality dropout. Evaluated on cardiovascular risk prediction using UK Biobank data, it effectively fuses cardiac MRI, electrocardiogram signals, and polygenic risk scores, significantly outperforming unimodal baselines while maintaining stability when certain modalities are missing.

Technology Category

Application Category

πŸ“ Abstract
mmid (Multi-Modal Integration and Downstream analyses for healthcare analytics) is a Python package that offers multi-modal fusion and imputation, classification, time-to-event prediction and clustering functionalities under a single interface, filling the gap of sequential data integration and downstream analyses for healthcare applications in a structured and flexible environment. mmid wraps in a unique package several algorithms for multi-modal decomposition, prediction and clustering, which can be combined smoothly with a single command and proper configuration files, thus facilitating reproducibility and transferability of studies involving heterogeneous health data sources. A showcase on personalised cardiovascular risk prediction is used to highlight the relevance of a composite pipeline enabling proper treatment and analysis of complex multi-modal data. We thus employed mmid in an example real application scenario involving cardiac magnetic resonance imaging, electrocardiogram, and polygenic risk scores data from the UK Biobank. We proved that the three modalities captured joint and individual information that was used to (1) early identify cardiovascular disease before clinical manifestations with cardiological relevance, and (2) do it better than single data sources alone. Moreover, mmid allowed to impute partially observable data modalities without considerable performance losses in downstream disease prediction, thus proving its relevance for real-world health analytics applications (which are often characterised by the presence of missing data).
Problem

Research questions and friction points this paper is trying to address.

multi-modal integration
healthcare analytics
missing data
heterogeneous data
cardiovascular risk prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-modal integration
healthcare analytics
missing data imputation
time-to-event prediction
reproducible pipeline
πŸ’Ό Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid
A
Andrea Mario Vergani
Human Technopole, Viale Rita Levi-Montalcini 1, 20157, Milan, Italy; Department of Electronics, Information and Bioengineering (DEIB), Politecnico di Milano, Via Ponzio 34/5, 20133 Milan, Italy; MOX lab, Department of Mathematics, Politecnico di Milano, Via Bonardi 9, 20133, Milan, Italy
V
Valeria Iapaolo
MOX lab, Department of Mathematics, Politecnico di Milano, Via Bonardi 9, 20133, Milan, Italy
E
Emanuele Di Angelantonio
Human Technopole, Viale Rita Levi-Montalcini 1, 20157, Milan, Italy; British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Papworth Road, Cambridge Biomedical Campus, Cambridge CB2 0BB, UK; Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Papworth Road, Cambridge Biomedical Campus, Cambridge CB2 0BB, UK; NIHR Blood and Transplant Research Unit in Donor Health and Behaviour, University of Cambridge,
Marco Masseroli
Marco Masseroli
Politecnico di Milano
BioinformaticsMachine LearningData Bases
Francesca Ieva
Francesca Ieva
Associate Professor, MOX - Department of Mathematics, Politecnico di Milano
Health Data ScienceHealth AnalyticsBiostatisticsStatistical Learning