GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning

📅 2025-04-02

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Current general-purpose medical AI models exhibit insufficient reasoning capabilities for complex clinical decision-making, suffer from poor generalization, and lack interpretable reasoning chains. To address these limitations, we propose a multimodal medical reasoning model specifically designed for complex clinical decision-making, integrating Vision Transformer (ViT) and Large Language Model (LLM) architectures. We introduce a novel logic-constrained rejection sampling method to synthesize high-quality, reasoning-rich multimodal training data and establish the first dedicated multimodal medical reasoning benchmark. Furthermore, we employ Proximal Policy Optimization (PPO)-based reinforcement learning to optimize diagnostic pathways and causal inference. Our approach achieves substantial improvements in accuracy and cross-scenario generalization on medical image diagnosis and visual question answering tasks—significantly outperforming supervised fine-tuning baselines. All code, data, and models are publicly released, establishing a new paradigm for trustworthy, reasoning-enabled medical AI.

Technology Category

Application Category

📝 Abstract

Recent advances in general medical AI have made significant strides, but existing models often lack the reasoning capabilities needed for complex medical decision-making. This paper presents GMAI-VL-R1, a multimodal medical reasoning model enhanced by reinforcement learning (RL) to improve its reasoning abilities. Through iterative training, GMAI-VL-R1 optimizes decision-making, significantly boosting diagnostic accuracy and clinical support. We also develop a reasoning data synthesis method, generating step-by-step reasoning data via rejection sampling, which further enhances the model's generalization. Experimental results show that after RL training, GMAI-VL-R1 excels in tasks such as medical image diagnosis and visual question answering. While the model demonstrates basic memorization with supervised fine-tuning, RL is crucial for true generalization. Our work establishes new evaluation benchmarks and paves the way for future advancements in medical reasoning models. Code, data, and model will be released at href{https://github.com/uni-medical/GMAI-VL-R1}{this link}.

Problem

Research questions and friction points this paper is trying to address.

Enhancing multimodal medical reasoning with reinforcement learning

Improving diagnostic accuracy through iterative RL optimization

Developing synthetic reasoning data for better model generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning enhances medical reasoning

Iterative training optimizes diagnostic accuracy

Reasoning data synthesis improves generalization

🔎 Similar Papers

No similar papers found.