A Causal Approach to Mitigate Modality Preference Bias in Medical Visual Question Answering

📅 2024-10-28
🏛️ Proceedings of the First International Workshop on Vision-Language Models for Biomedical Applications
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Medical visual question answering (MedVQA) models suffer from modality preference bias—over-relying on textual questions while neglecting diagnostic image content—exacerbated by spurious question-answer priors prevalent in existing datasets. Method: We propose MedCFVQA, a causality-driven counterfactual VQA framework that explicitly disentangles non-causal associations between image and text modalities via a structured causal graph. To mitigate question-prior confounding, we introduce a Causal Prior (CP) rebalancing strategy, yielding de-biased variants SLAKE-CP and RadVQA-CP. MedCFVQA integrates causal graph modeling, counterfactual inference, and modality-decoupled training. Contribution/Results: MedCFVQA achieves significant improvements over non-causal baselines on SLAKE, RadVQA, and their CP counterparts. It is the first work to empirically validate that causal intervention—not merely architectural refinement—is essential for robust, synergistic multimodal understanding in medical VQA.

Technology Category

Application Category

📝 Abstract
Medical Visual Question Answering (MedVQA) is crucial for enhancing the efficiency of clinical diagnosis by providing accurate and timely responses to clinicians' inquiries regarding medical images. Existing MedVQA models suffered from modality preference bias, where predictions are heavily dominated by one modality while overlooking the other (in MedVQA, usually questions dominate the answer but images are overlooked), thereby failing to learn multimodal knowledge. To overcome the modality preference bias, we proposed a Medical CounterFactual VQA (MedCFVQA) model, which trains with bias and leverages causal graphs to eliminate the modality preference bias during inference. Existing MedVQA datasets exhibit substantial prior dependencies between questions and answers, which results in acceptable performance even if the model significantly suffers from the modality preference bias. To address this issue, we reconstructed new datasets by leveraging existing MedVQA datasets and Changed their P3rior dependencies (CP) between questions and their answers in the training and test set. Extensive experiments demonstrate that MedCFVQA significantly outperforms its non-causal counterpart on both SLAKE, RadVQA and SLAKE-CP, RadVQA-CP datasets.
Problem

Research questions and friction points this paper is trying to address.

Mitigate modality preference bias in MedVQA models
Address prior dependencies between questions and answers
Improve multimodal learning in medical visual question answering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses causal graphs to eliminate modality bias
Trains with bias for better inference performance
Reconstructs datasets to reduce prior dependencies