QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training

📅 2025-05-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current multimodal large language models (MLLMs) exhibit excessive reliance on visual inputs and poor cross-specialty generalization, limiting their ability to jointly reason over medical images, temporal physiological signals, and clinical text reports in comprehensive healthcare settings. To address this, we propose QoQ-Med—the first open-source, generalist clinical MLLM—and introduce Domain-aware Relative Policy Optimization (DRPO), a novel reinforcement learning framework that hierarchically normalizes rewards based on clinical domain rarity and modality-specific difficulty, thereby systematically mitigating performance imbalance across heterogeneous clinical modalities. QoQ-Med is instruction-tuned on 2.61 million multimodal samples spanning nine medical specialties and incorporates an interpretable segmentation-alignment mechanism. Experiments demonstrate a 43% average improvement in macro-F1 score and lesion localization IoU tenfold higher than prior open-source models—on par with OpenAI o4-mini. All model weights, training pipelines, and inference trajectories are publicly released.

Technology Category

Application Category

📝 Abstract
Clinical decision-making routinely demands reasoning over heterogeneous data, yet existing multimodal language models (MLLMs) remain largely vision-centric and fail to generalize across clinical specialties. To bridge this gap, we introduce QoQ-Med-7B/32B, the first open generalist clinical foundation model that jointly reasons across medical images, time-series signals, and text reports. QoQ-Med is trained with Domain-aware Relative Policy Optimization (DRPO), a novel reinforcement-learning objective that hierarchically scales normalized rewards according to domain rarity and modality difficulty, mitigating performance imbalance caused by skewed clinical data distributions. Trained on 2.61 million instruction tuning pairs spanning 9 clinical domains, we show that DRPO training boosts diagnostic performance by 43% in macro-F1 on average across all visual domains as compared to other critic-free training methods like GRPO. Furthermore, with QoQ-Med trained on intensive segmentation data, it is able to highlight salient regions related to the diagnosis, with an IoU 10x higher than open models while reaching the performance of OpenAI o4-mini. To foster reproducibility and downstream research, we release (i) the full model weights, (ii) the modular training pipeline, and (iii) all intermediate reasoning traces at https://github.com/DDVD233/QoQ_Med.
Problem

Research questions and friction points this paper is trying to address.

Generalizing multimodal models across diverse clinical specialties
Mitigating performance imbalance from skewed clinical data
Enhancing diagnostic accuracy and interpretability in medical imaging
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-aware GRPO training for clinical data
Multimodal reasoning across images, signals, text
Hierarchical reward scaling by domain rarity
🔎 Similar Papers
No similar papers found.
W
Wei Dai
Massachusetts Institute of Technology
Peilin Chen
Peilin Chen
University of Virginia
AI ChipsIn-Memory ComputingComputer Architecture
C
C. Ekbote
Massachusetts Institute of Technology
P
Paul Pu Liang
Massachusetts Institute of Technology