Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval

πŸ“… 2025-01-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the insufficient robustness of contrastive learning models for medical image–report cross-modal retrieval, particularly under low-quality images (e.g., occluded inputs). We propose the first image-patch-level occlusion robustness evaluation paradigm tailored to clinical imaging, systematically benchmarking CLIP, CXR-RePaiR, MedCLIP, and CXR-CLIP. Results show significant performance degradation across all models under occlusion; MedCLIP achieves the highest robustness but lags behind CXR-CLIP and CXR-RePaiR in retrieval accuracy; CLIP exhibits the worst generalization to medical domains. Our core contribution is establishing a novel out-of-distribution (OOD) interference evaluation standard and empirically demonstrating that domain-specific training data is critical for enhancing occlusion robustness. These findings provide both empirical evidence and methodological guidance for designing robust medical cross-modal models.

Technology Category

Application Category

πŸ“ Abstract
Medical images and reports offer invaluable insights into patient health. The heterogeneity and complexity of these data hinder effective analysis. To bridge this gap, we investigate contrastive learning models for cross-domain retrieval, which associates medical images with their corresponding clinical reports. This study benchmarks the robustness of four state-of-the-art contrastive learning models: CLIP, CXR-RePaiR, MedCLIP, and CXR-CLIP. We introduce an occlusion retrieval task to evaluate model performance under varying levels of image corruption. Our findings reveal that all evaluated models are highly sensitive to out-of-distribution data, as evidenced by the proportional decrease in performance with increasing occlusion levels. While MedCLIP exhibits slightly more robustness, its overall performance remains significantly behind CXR-CLIP and CXR-RePaiR. CLIP, trained on a general-purpose dataset, struggles with medical image-report retrieval, highlighting the importance of domain-specific training data. The evaluation of this work suggests that more effort needs to be spent on improving the robustness of these models. By addressing these limitations, we can develop more reliable cross-domain retrieval models for medical applications.
Problem

Research questions and friction points this paper is trying to address.

Contrastive Learning
Medical Image Analysis
Report Association
Innovation

Methods, ideas, or system contributions that make the work stand out.

Medical Image-Report Matching
Robustness under Image Degradation
Specialized Medical Domain Training
πŸ”Ž Similar Papers
No similar papers found.
D
Demetrio Deanda
Department of Computational, Engineering, and Mathematical Sciences, Texas A&M University-San Antonio
Y
Yuktha Priya Masupalli
Department of Computational, Engineering, and Mathematical Sciences, Texas A&M University-San Antonio
Jeong Yang
Jeong Yang
Texas A&M University-San Antonio
Cloud ComputingSoftware SecuritySource Code Analysis and Visulaization
Y
Young Lee
Department of Computational, Engineering, and Mathematical Sciences, Texas A&M University-San Antonio
Zechun Cao
Zechun Cao
Texas A&M University-San Antonio
Cyber SecurityPrivacyMachine Learning
G
Gongbo Liang
Department of Computational, Engineering, and Mathematical Sciences, Texas A&M University-San Antonio