Progressive Local Alignment for Medical Multimodal Pre-training

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Fine-grained local alignment between medical images and text faces two key challenges: the absence of natural local correspondences and poor generalizability of rigid region identification methods. To address these, we propose PLAN, a progressive contrastive learning framework featuring a novel word-pixel soft region mapping mechanism. PLAN employs multi-stage contrastive learning to achieve adaptive pixel-level and word-level association, integrating soft region attention modeling with noise-aware suppression to enable robust localization of irregular anatomical structures. Evaluated on four tasks—phrase grounding, image–text retrieval, object detection, and zero-shot classification—PLAN consistently outperforms state-of-the-art methods across multiple medical benchmarks (e.g., MIMIC-CXR, RadGraph), establishing new performance records. It significantly improves cross-modal fine-grained semantic alignment accuracy and enhances clinical interpretability.

Technology Category

Application Category

📝 Abstract
Local alignment between medical images and text is essential for accurate diagnosis, though it remains challenging due to the absence of natural local pairings and the limitations of rigid region recognition methods. Traditional approaches rely on hard boundaries, which introduce uncertainty, whereas medical imaging demands flexible soft region recognition to handle irregular structures. To overcome these challenges, we propose the Progressive Local Alignment Network (PLAN), which designs a novel contrastive learning-based approach for local alignment to establish meaningful word-pixel relationships and introduces a progressive learning strategy to iteratively refine these relationships, enhancing alignment precision and robustness. By combining these techniques, PLAN effectively improves soft region recognition while suppressing noise interference. Extensive experiments on multiple medical datasets demonstrate that PLAN surpasses state-of-the-art methods in phrase grounding, image-text retrieval, object detection, and zero-shot classification, setting a new benchmark for medical image-text alignment.
Problem

Research questions and friction points this paper is trying to address.

Enhances medical image-text local alignment
Improves soft region recognition accuracy
Introduces progressive learning for word-pixel relationships
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive learning-based local alignment
Progressive learning strategy refinement
Soft region recognition enhancement
🔎 Similar Papers
2024-01-02IEEE International Conference on Bioinformatics and BiomedicineCitations: 0
H
Huimin Yan
Institute of Intelligent Information Processing, Shanxi University, Taiyuan, 030006, China
Xian Yang
Xian Yang
University of Manchester
Artificial IntelligenceMachine LearningHealthcare AINatural Language Processing
L
Liang Bai
Institute of Intelligent Information Processing, Shanxi University, Taiyuan, 030006, China
Jiye Liang
Jiye Liang
Shanxi University