🤖 AI Summary
To address the challenges of label scarcity, minute lesion regions, and severe class imbalance in mammography—hindering effective adaptation of CLIP models—this paper proposes MaMA, the first end-to-end CLIP pretraining framework tailored for mammographic imaging. Methodologically, MaMA introduces a novel multi-view supervised contrastive learning strategy coupled with a symmetric local alignment module; integrates medical-knowledge-enhanced parameter-efficient fine-tuning of large language models; and incorporates a high-resolution local attention mechanism for image encoding. Evaluated on EMBED and RSNA-Mammo across classification, cross-modal retrieval, and zero-shot diagnosis tasks, MaMA consistently outperforms all existing state-of-the-art methods with substantial performance gains. Notably, its model size is only 52% of the largest baseline, achieving both computational efficiency and clinical practicality.
📝 Abstract
Contrastive Language-Image Pre-training (CLIP) demonstrates strong potential in medical image analysis but requires substantial data and computational resources. Due to these restrictions, existing CLIP applications in medical imaging focus mainly on modalities like chest X-rays that have abundant image-report data available, leaving many other important modalities underexplored. Here, we propose one of the first adaptations of the full CLIP model to mammography, which presents significant challenges due to labeled data scarcity, high-resolution images with small regions of interest, and class-wise imbalance. We first develop a specialized supervision framework for mammography that leverages its multi-view nature. Furthermore, we design a symmetric local alignment module to better focus on detailed features in high-resolution images. Lastly, we incorporate a parameter-efficient fine-tuning approach for large language models pre-trained with medical knowledge to address data limitations. Our multi-view and multi-scale alignment (MaMA) method outperforms state-of-the-art baselines for three different tasks on two large real-world mammography datasets, EMBED and RSNA-Mammo, with only 52% model size compared with the largest baseline. The code is available at https://github.com/XYPB/MaMA