DuPLUS: Dual-Prompt Vision-Language Framework for Universal Medical Image Segmentation and Prognosis

📅 2025-10-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Deep learning in medical imaging suffers from poor task generalization, limited prognostic capability, shallow semantic understanding in existing “general-purpose” models, and coarse-grained conditional control. To address these challenges, we propose DuPLUS—the first vision-language unified framework supporting cross-modal (CT/MRI/ultrasound), multi-organ, and fine-grained semantic control. Its core innovations include a hierarchical semantic prompting scheme and a dual-path prompt fusion mechanism, jointly modeling multimodal imaging data and electronic health records (EHR) for end-to-end segmentation and cancer prognosis prediction. Leveraging vision-language pretraining with parameter-efficient fine-tuning, DuPLUS significantly enhances clinical interpretability and scalability. Experiments across 10 benchmark datasets demonstrate state-of-the-art performance on 8 segmentation metrics; for head-and-neck cancer prognosis prediction, it achieves a C-index of 0.69, validating its clinical utility.

Technology Category

Application Category

📝 Abstract

Deep learning for medical imaging is hampered by task-specific models that lack generalizability and prognostic capabilities, while existing 'universal' approaches suffer from simplistic conditioning and poor medical semantic understanding. To address these limitations, we introduce DuPLUS, a deep learning framework for efficient multi-modal medical image analysis. DuPLUS introduces a novel vision-language framework that leverages hierarchical semantic prompts for fine-grained control over the analysis task, a capability absent in prior universal models. To enable extensibility to other medical tasks, it includes a hierarchical, text-controlled architecture driven by a unique dual-prompt mechanism. For segmentation, DuPLUS is able to generalize across three imaging modalities, ten different anatomically various medical datasets, encompassing more than 30 organs and tumor types. It outperforms the state-of-the-art task specific and universal models on 8 out of 10 datasets. We demonstrate extensibility of its text-controlled architecture by seamless integration of electronic health record (EHR) data for prognosis prediction, and on a head and neck cancer dataset, DuPLUS achieved a Concordance Index (CI) of 0.69. Parameter-efficient fine-tuning enables rapid adaptation to new tasks and modalities from varying centers, establishing DuPLUS as a versatile and clinically relevant solution for medical image analysis. The code for this work is made available at: https://anonymous.4open.science/r/DuPLUS-6C52

Problem

Research questions and friction points this paper is trying to address.

Addresses limited generalizability and prognostic capabilities in medical imaging models

Enables universal segmentation across multiple modalities and anatomical structures

Integrates medical imaging with EHR data for improved prognosis prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical semantic prompts enable fine-grained control

Dual-prompt mechanism drives text-controlled architecture

Parameter-efficient fine-tuning allows rapid task adaptation

🔎 Similar Papers

No similar papers found.

Authors to Follow