GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current general-purpose medical AI models exhibit insufficient reasoning capabilities for complex clinical decision-making, suffer from poor generalization, and lack interpretable reasoning chains. To address these limitations, we propose a multimodal medical reasoning model specifically designed for complex clinical decision-making, integrating Vision Transformer (ViT) and Large Language Model (LLM) architectures. We introduce a novel logic-constrained rejection sampling method to synthesize high-quality, reasoning-rich multimodal training data and establish the first dedicated multimodal medical reasoning benchmark. Furthermore, we employ Proximal Policy Optimization (PPO)-based reinforcement learning to optimize diagnostic pathways and causal inference. Our approach achieves substantial improvements in accuracy and cross-scenario generalization on medical image diagnosis and visual question answering tasks—significantly outperforming supervised fine-tuning baselines. All code, data, and models are publicly released, establishing a new paradigm for trustworthy, reasoning-enabled medical AI.

Technology Category

Application Category

📝 Abstract
Recent advances in general medical AI have made significant strides, but existing models often lack the reasoning capabilities needed for complex medical decision-making. This paper presents GMAI-VL-R1, a multimodal medical reasoning model enhanced by reinforcement learning (RL) to improve its reasoning abilities. Through iterative training, GMAI-VL-R1 optimizes decision-making, significantly boosting diagnostic accuracy and clinical support. We also develop a reasoning data synthesis method, generating step-by-step reasoning data via rejection sampling, which further enhances the model's generalization. Experimental results show that after RL training, GMAI-VL-R1 excels in tasks such as medical image diagnosis and visual question answering. While the model demonstrates basic memorization with supervised fine-tuning, RL is crucial for true generalization. Our work establishes new evaluation benchmarks and paves the way for future advancements in medical reasoning models. Code, data, and model will be released at href{https://github.com/uni-medical/GMAI-VL-R1}{this link}.
Problem

Research questions and friction points this paper is trying to address.

Enhancing multimodal medical reasoning with reinforcement learning
Improving diagnostic accuracy through iterative RL optimization
Developing synthetic reasoning data for better model generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning enhances medical reasoning
Iterative training optimizes diagnostic accuracy
Reasoning data synthesis improves generalization
🔎 Similar Papers
No similar papers found.
Yanzhou Su
Yanzhou Su
FZU, UESTC
medical image analysis
Tianbin Li
Tianbin Li
Shanghai Artificial Intelligence Laboratory
Machine LearningComputer VisionGeneral Intelligence
J
Jiyao Liu
Fuzhou University
Chenglong Ma
Chenglong Ma
Fudan University; Shanghai Innovation Institute
multi-modal modelsgenerative modelsmedical image analysis
J
Junzhi Ning
Fuzhou University
C
Cheng Tang
Fuzhou University
S
Sibo Ju
Fuzhou University
J
Jin Ye
Fuzhou University
P
Pengcheng Chen
Fuzhou University
M
Ming Hu
Shanghai Artificial Intelligence Laboratory
S
Shixiang Tang
Shanghai Artificial Intelligence Laboratory
Lihao Liu
Lihao Liu
Amazon
LLM-based AgentHealthcare AI
B
Bin Fu
Shanghai Artificial Intelligence Laboratory
Wenqi Shao
Wenqi Shao
Researcher at Shanghai AI Laboratory
Foundation Model EvaluationLLM CompressionEfficient AdaptationMultimodal Learning
X
Xiaowei Hu
Shanghai Innovation Institute
X
Xiangwen Liao
Fudan University
Yuanfeng Ji
Yuanfeng Ji
Stanford; HKU
Computer visionMedical Image Analysis
Junjun He
Junjun He
Shanghai Jiao Tong University
S
Stanford University