Bridged Semantic Alignment for Zero-shot 3D Medical Image Diagnosis

📅 2025-01-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address semantic disalignment between visual and textual embeddings, challenges in cross-modal alignment, and severe annotation scarcity for rare diseases in zero-shot 3D medical image diagnosis, this paper proposes a Semantic Bridging Framework. The method leverages large language model (LLM)-generated clinical report summaries as semantic anchors to construct dynamic vision–language bridges, and introduces a cross-modal knowledge interaction and contrastive alignment module to enable fully unsupervised zero-shot transfer. Evaluated on three benchmark datasets encompassing 15 rare abnormalities, our approach achieves state-of-the-art performance, with substantial gains in zero-shot diagnostic accuracy—particularly for lesions with extremely limited annotations—demonstrating strong generalization capability. The core innovation lies in the first structured cross-modal alignment between LLM-derived semantic summaries and 3D medical image embeddings, effectively bridging the modality gap.

Technology Category

Application Category

📝 Abstract
3D medical images such as Computed tomography (CT) are widely used in clinical practice, offering a great potential for automatic diagnosis. Supervised learning-based approaches have achieved significant progress but rely heavily on extensive manual annotations, limited by the availability of training data and the diversity of abnormality types. Vision-language alignment (VLA) offers a promising alternative by enabling zero-shot learning without additional annotations. However, we empirically discover that the visual and textural embeddings after alignment endeavors from existing VLA methods form two well-separated clusters, presenting a wide gap to be bridged. To bridge this gap, we propose a Bridged Semantic Alignment (BrgSA) framework. First, we utilize a large language model to perform semantic summarization of reports, extracting high-level semantic information. Second, we design a Cross-Modal Knowledge Interaction (CMKI) module that leverages a cross-modal knowledge bank as a semantic bridge, facilitating interaction between the two modalities, narrowing the gap, and improving their alignment. To comprehensively evaluate our method, we construct a benchmark dataset that includes 15 underrepresented abnormalities as well as utilize two existing benchmark datasets. Experimental results demonstrate that BrgSA achieves state-of-the-art performances on both public benchmark datasets and our custom-labeled dataset, with significant improvements in zero-shot diagnosis of underrepresented abnormalities.
Problem

Research questions and friction points this paper is trying to address.

Unsupervised Learning
3D Medical Image Analysis
Rare Disease Recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bridging Semantic Alignment (BrgSA)
Cross-modal Knowledge Interaction (CMKI)
Rare Disease Diagnosis in 3D Medical Images
🔎 Similar Papers
No similar papers found.
Haoran Lai
Haoran Lai
University of Science and Technology of China
Medical Image ProcessingDeep Learning
Zihang Jiang
Zihang Jiang
School of Biomedical Engineering, USTC, Suzhou Institute for Advanced Research
Computer VisionMedical Imaging3D
Qingsong Yao
Qingsong Yao
Stanford University | ICT, CAS
Medical Image ComputingMedical Image Analysis
Rongsheng Wang
Rongsheng Wang
The Chinese University of Hong Kong, Shenzhen
Deep Learning
Zhiyang He
Zhiyang He
Massachusetts Institute of Technology
Quantum Information
X
Xiaodong Tao
Medical Business Department, iFlytek Co.Ltd, Hefei, Anhui, 230088, China
W
Wei Wei
The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, 230001, China
W
Weifu Lv
Department of Radiology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230001, Anhui, China
S
S.Kevin Zhou
School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, 230026, P.R.China; Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, Jiangsu, 215123, P.R.China; Center for Medical Imaging, Robotics, Analytic Computing & Learning (MIRACLE), Suzhou Institute for Advanced Research, USTC, Suzhou Jiangsu, 215123, China; State Key Laboratory of Precision and Intelligent Chemistry, University of Scie