CLIP in Medical Imaging: A Comprehensive Survey

📅 2023-12-12
🏛️ arXiv.org
📈 Citations: 41
Influential: 1
📄 PDF
🤖 AI Summary
To address CLIP’s poor adaptability to medical imaging, weak cross-modal alignment, and limited generalization across clinical tasks, this work introduces the first end-to-end medical CLIP research framework. We propose a pretraining optimization strategy tailored to medical image–report pairs, incorporating contrastive learning enhancement, fine-grained text–image alignment modeling, and cross-modal feature disentanglement. A task-oriented application taxonomy is established, encompassing dense prediction (e.g., classification, segmentation) and generative tasks (e.g., radiology report generation). Through a systematic review of over 100 state-of-the-art studies, we identify core challenges—including data bias, modality heterogeneity, and annotation scarcity. We further release an authoritative, open-source literature repository on GitHub. This framework provides both theoretical foundations and practical paradigms to enhance interpretability, robustness, and generalization of clinical AI models.
📝 Abstract
Contrastive Language-Image Pre-training (CLIP), a simple yet effective pre-training paradigm, successfully introduces text supervision to vision models. It has shown promising results across various tasks, attributable to its generalizability and interpretability. The use of CLIP has recently gained increasing interest in the medical imaging domain, serving both as a pre-training paradigm for aligning medical vision and language, and as a critical component in diverse clinical tasks. With the aim of facilitating a deeper understanding of this promising direction, this survey offers an in-depth exploration of the CLIP paradigm within the domain of medical imaging, regarding both refined CLIP pre-training and CLIP-driven applications. In this study, We (1) start with a brief introduction to the fundamentals of CLIP methodology. (2) Then, we investigate the adaptation of CLIP pre-training in the medical domain, focusing on how to optimize CLIP given characteristics of medical images and reports. (3) Furthermore, we explore the practical utilization of CLIP pre-trained models in various tasks, including classification, dense prediction, and cross-modal tasks. (4) Finally, we discuss existing limitations of CLIP in the context of medical imaging and propose forward-looking directions to address the demands of medical imaging domain. We expect that this comprehensive survey will provide researchers in the field of medical image analysis with a holistic understanding of the CLIP paradigm and its potential implications. The project page can be found on https://github.com/zhaozh10/Awesome-CLIP-in-Medical-Imaging.
Problem

Research questions and friction points this paper is trying to address.

Adapt CLIP pre-training for medical image-text alignment
Explore CLIP applications in clinical tasks like classification
Address limitations of CLIP in medical imaging domain
Innovation

Methods, ideas, or system contributions that make the work stand out.

CLIP introduces text supervision to vision models
Adapts CLIP for medical image-text alignment
Explores CLIP in classification and prediction tasks
🔎 Similar Papers
No similar papers found.
Z
Zihao Zhao
School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
Yuxiao Liu
Yuxiao Liu
ShanghaiTech University
fMRIneuroscienceNLPLarge Language Model
H
Han Wu
School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
Y
Yonghao Li
School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
S
Sheng Wang
School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China; School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
L
L. Teng
School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
Disheng Liu
Disheng Liu
Case Western Reserve University
Computer Vision
X
Xiang Li
Z
Zhiming Cui
School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
Q
Qian Wang
School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
Dinggang Shen
Dinggang Shen
Prof. and Founding Dean, School of BME, ShanghaiTech University; Co-CEO, United Imaging Intelligence
Medical Image AnalysisMedical Image ComputingBiomedical Image AnalysisImage Registration