A Step towards Interpretable Multimodal AI Models with MultiFIX

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of trustworthiness and interpretability of multimodal AI models in high-stakes domains such as healthcare, this paper proposes MultiFIX—a novel framework that introduces modality-specific interpretability modules *early* in the deep feature extraction pipeline: symbolic regression for tabular data and Grad-CAM integration for imaging data, enabling concurrent feature-level and fusion-level interpretability. Crucially, MultiFIX pioneers an “interpretability-forward” design, embedding transparency directly into the modeling workflow without compromising predictive accuracy. Evaluated on multiple synthetic multimodal datasets, MultiFIX achieves accuracy comparable to black-box baselines while providing verifiable, quantitative attribution of modality contributions and fully traceable, stepwise fusion decision paths. These advances significantly enhance clinical deployability and stakeholder trust.

Technology Category

Application Category

📝 Abstract
Real-world problems are often dependent on multiple data modalities, making multimodal fusion essential for leveraging diverse information sources. In high-stakes domains, such as in healthcare, understanding how each modality contributes to the prediction is critical to ensure trustworthy and interpretable AI models. We present MultiFIX, an interpretability-driven multimodal data fusion pipeline that explicitly engineers distinct features from different modalities and combines them to make the final prediction. Initially, only deep learning components are used to train a model from data. The black-box (deep learning) components are subsequently either explained using post-hoc methods such as Grad-CAM for images or fully replaced by interpretable blocks, namely symbolic expressions for tabular data, resulting in an explainable model. We study the use of MultiFIX using several training strategies for feature extraction and predictive modeling. Besides highlighting strengths and weaknesses of MultiFIX, experiments on a variety of synthetic datasets with varying degrees of interaction between modalities demonstrate that MultiFIX can generate multimodal models that can be used to accurately explain both the extracted features and their integration without compromising predictive performance.
Problem

Research questions and friction points this paper is trying to address.

Enhancing interpretability in multimodal AI models for diverse data sources
Ensuring trustworthy predictions in high-stakes domains like healthcare
Balancing model accuracy with explainable feature integration across modalities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal fusion with interpretability-driven pipeline
Replaces black-box components with explainable blocks
Maintains predictive performance while ensuring explainability
🔎 Similar Papers
No similar papers found.
Mafalda Malafaia
Mafalda Malafaia
Centrum Wiskunde & Informatica
eXplainable AIMultimodalityAI for Health
T
Thalea Schlender
Leiden University Medical Center, Leiden, The Netherlands
T
T. Alderliesten
Leiden University Medical Center, Leiden, The Netherlands
P
Peter A. N. Bosman
Centrum Wiskunde & Informatica, Amsterdam, The Netherlands; Delft University of Technology, Delft, The Netherlands