What are You Looking at? Modality Contribution in Multimodal Medical Deep Learning Methods

📅 2025-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multimodal medical deep learning, the actual contribution of each modality—imaging, text, and physiological signals—to model decisions remains poorly quantified, hindering interpretability and clinical trust. To address this, we propose a model-agnostic, modality-level occlusion-based attribution method that employs sliding-window sensitivity analysis to quantify modality contributions under diverse fusion strategies (early, late, and hybrid). Our work is the first to empirically reveal modality preference and unimodal collapse in multimodal models, and to establish a statistically significant correlation (p < 0.01) between modality-wise attribution scores and the performance of corresponding unimodal baselines. We validate these findings across three clinical domains—radiology, pathology, and time-series physiological monitoring—demonstrating inherent data-level modality imbalance and model-induced bias. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Purpose High dimensional, multimodal data can nowadays be analyzed by huge deep neural networks with little effort. Several fusion methods for bringing together different modalities have been developed. Particularly, in the field of medicine with its presence of high dimensional multimodal patient data, multimodal models characterize the next step. However, what is yet very underexplored is how these models process the source information in detail. Methods To this end, we implemented an occlusion-based both model and performance agnostic modality contribution method that quantitatively measures the importance of each modality in the dataset for the model to fulfill its task. We applied our method to three different multimodal medical problems for experimental purposes. Results Herein we found that some networks have modality preferences that tend to unimodal collapses, while some datasets are imbalanced from the ground up. Moreover, we could determine a link between our metric and the performance of single modality trained nets. Conclusion The information gain through our metric holds remarkable potential to improve the development of multimodal models and the creation of datasets in the future. With our method we make a crucial contribution to the field of interpretability in deep learning based multimodal research and thereby notably push the integrability of multimodal AI into clinical practice. Our code is publicly available at https://github.com/ChristianGappGit/MC_MMD.
Problem

Research questions and friction points this paper is trying to address.

Analyzes modality importance in multimodal medical deep learning models.
Develops a method to measure modality contribution quantitatively.
Explores modality preferences and dataset imbalances in medical AI.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Occlusion-based modality contribution method
Quantitative measurement of modality importance
Improves multimodal model interpretability
🔎 Similar Papers
No similar papers found.
C
Christian Gapp
Institute of Biomedical Image Analysis, UMIT TIROL – Private University for Health Sciences and Health Technology, Eduard-Wallnöfer-Zentrum 1, 6060 Hall in Tirol, Austria; VASCage – Centre on Clinical Stroke Research, Innsbruck, Austria
Elias Tappeiner
Elias Tappeiner
Researcher, UMIT - Private University for Health Sciences, Medical Informatics and Technology
machine learningmedical image segmentation
Martin Welk
Martin Welk
Associate Professor for Image Analysis
mathematical image analysis
K
K. Fritscher
VASCage – Centre on Clinical Stroke Research, Innsbruck, Austria
E
E. R. Gizewski
Department of Radiology, Medical University of Innsbruck, 6020 Innsbruck, Austria
Rainer Schubert
Rainer Schubert
Professor für Biomedizinische Informatik, UMIT
Biomedizinische Bildanalyse