🤖 AI Summary
This work addresses the challenges of multimodal data fusion, pervasive missing values, and the lack of unified analytical tools in healthcare research by introducing a modular Python package that, for the first time, integrates multimodal decomposition, imputation, classification, survival prediction, and clustering within a single, end-to-end reproducible framework. The proposed method enables flexible configuration to handle partially observed data and demonstrates robust performance even under modality dropout. Evaluated on cardiovascular risk prediction using UK Biobank data, it effectively fuses cardiac MRI, electrocardiogram signals, and polygenic risk scores, significantly outperforming unimodal baselines while maintaining stability when certain modalities are missing.
📝 Abstract
mmid (Multi-Modal Integration and Downstream analyses for healthcare analytics) is a Python package that offers multi-modal fusion and imputation, classification, time-to-event prediction and clustering functionalities under a single interface, filling the gap of sequential data integration and downstream analyses for healthcare applications in a structured and flexible environment. mmid wraps in a unique package several algorithms for multi-modal decomposition, prediction and clustering, which can be combined smoothly with a single command and proper configuration files, thus facilitating reproducibility and transferability of studies involving heterogeneous health data sources. A showcase on personalised cardiovascular risk prediction is used to highlight the relevance of a composite pipeline enabling proper treatment and analysis of complex multi-modal data. We thus employed mmid in an example real application scenario involving cardiac magnetic resonance imaging, electrocardiogram, and polygenic risk scores data from the UK Biobank. We proved that the three modalities captured joint and individual information that was used to (1) early identify cardiovascular disease before clinical manifestations with cardiological relevance, and (2) do it better than single data sources alone. Moreover, mmid allowed to impute partially observable data modalities without considerable performance losses in downstream disease prediction, thus proving its relevance for real-world health analytics applications (which are often characterised by the presence of missing data).