Beyond Symmetric Alignment: Spectral Diagnostics of Modality Imbalance in Vision-Language Models in the Medical Domain

📅 2026-06-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

191K/year
🤖 AI Summary
Existing vision-language models exhibit limited performance on medical image–text tasks and lack effective tools to quantify inter-modal information imbalance. This work proposes the Asymmetric Spectral Alignment Score (SAS), introducing for the first time a directional alignment metric that projects multimodal representations onto the principal component basis of an anchor modality and computes modality-wise correlations weighted by eigenvalues. SAS reveals an asymmetry in which medical images retain richer structural information than clinical text. Integrated into an evaluation framework encompassing 15 vision-language models and six alignment metrics, SAS demonstrates the strongest correlation with bidirectional retrieval performance under label-free conditions, offering a practical and interpretable tool for assessing medical multimodal models.
📝 Abstract
Vision-Language Models (VLMs) struggle when applied to medical image-text data, yet the tools available to diagnose this failure remain limited. Existing representation alignment metrics are symmetric, collapsing both modalities into a single score and hiding which modality drives cross-modal degradation. We introduce the Spectral Alignment Score (SAS), an asymmetric metric that projects both modalities onto the principal eigenbasis of an anchor modality and computes eigenvalue-weighted per-eigenmode correlations, resulting in directional scores whose difference quantifies modality information imbalance. We embed SAS within a benchmarking framework evaluating 15 VLMs across natural and medical image-text datasets alongside 6 alignment metrics and bidirectional retrieval. Our experiments show that medical images retain richer structural information than their paired clinical reports, a directional asymmetry invisible to all competing metrics, and that SAS achieves the strongest zero-label correlation with retrieval performance in the medical domain, positioning it as a practical diagnostic tool for clinical deployment. Code is available at this URL: https://github.com/iamalegambetti/medical-vlms-assessment.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language Models
Modality Imbalance
Medical Domain
Representation Alignment
Asymmetric Metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral Alignment Score
asymmetric metric
modality imbalance
vision-language models
medical domain
🔎 Similar Papers
No similar papers found.