Analysis of Image-and-Text Uncertainty Propagation in Multimodal Large Language Models with Cardiac MR-Based Applications

📅 2025-07-17

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Multimodal large language models (MLLMs) suffer from poorly understood uncertainty propagation mechanisms between image and text inputs, resulting in weak generalization across tasks and data distributions. Method: We propose a transferable multimodal uncertainty propagation framework, integrating uncertainty propagation theory, low-dimensional parameter modeling, and few-shot fine-tuning—trained on cardiac magnetic resonance imaging and electronic health record data. Contribution/Results: Our approach uncovers an intrinsic low-dimensional uncertainty structure latent in pretrained MLLMs and enables efficient, robust decomposition and estimation of multimodal uncertainty. Experiments demonstrate accurate uncertainty quantification with only a few labeled samples, significantly improving clinical prediction reliability. Moreover, the model automatically identifies redundant data factors, enhancing interpretability. This work establishes a novel paradigm for trustworthy multimodal AI in healthcare, supporting cross-task and cross-distribution uncertainty calibration without extensive retraining.

Technology Category

Application Category

📝 Abstract

Multimodal large language models (MLLMs) can process and integrate information from multimodality sources, such as text and images. However, interrelationship among input modalities, uncertainties due to individual uni-modal data and potential clinical applications following such an uncertainty decomposition are yet fully understood in the context of large-scale MLLMs. In this work, we propose a multimodal uncertainty propagation model (MUPM) based on uncertainty propagation, to characterise the relationship among the uncertainties arising from image-only, text-only, and joint image-text variations in MLLM inputs. Using real clinical data consisting of cardiac MR scans and digital health records, we describe that MUPMs can be optimised robustly with a few samples. We then show that the fitted MUPMs are generalisable across different input data distributions and, perhaps surprisingly, across different downstream tasks. Such a transferability may be explained by the shared pretraining, comparatively light MLLM fine-tuning, along with the low-dimensional nature of the MUPMs. More importantly, this learned transferability, quantifying the relationship between these uncertainties, led to direct clinical applications in which uncertainties may be estimated and thus analysed robustly for varying data or even a novel set of cardiac disease prediction tasks. In addition, we show experimentally the efficiency in multimodal data required for estimating the overall uncertainty and its ability to identify redundant factors, both of which are considered practical yet clinically useful applications with the proposed MUPMs. Codes are available at https://github.com/yucheng722/MUPM.

Problem

Research questions and friction points this paper is trying to address.

Analyzing uncertainty propagation in multimodal large language models

Characterizing image-text uncertainty relationships in clinical data

Developing transferable uncertainty models for cardiac disease prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes multimodal uncertainty propagation model (MUPM)

Optimizes MUPM robustly with few clinical samples

Generalizes across data distributions and tasks

🔎 Similar Papers

MedRG: Medical Report Grounding with Multi-modal Large Language Model