🤖 AI Summary
Current general-purpose medical AI models exhibit insufficient reasoning capabilities for complex clinical decision-making, suffer from poor generalization, and lack interpretable reasoning chains. To address these limitations, we propose a multimodal medical reasoning model specifically designed for complex clinical decision-making, integrating Vision Transformer (ViT) and Large Language Model (LLM) architectures. We introduce a novel logic-constrained rejection sampling method to synthesize high-quality, reasoning-rich multimodal training data and establish the first dedicated multimodal medical reasoning benchmark. Furthermore, we employ Proximal Policy Optimization (PPO)-based reinforcement learning to optimize diagnostic pathways and causal inference. Our approach achieves substantial improvements in accuracy and cross-scenario generalization on medical image diagnosis and visual question answering tasks—significantly outperforming supervised fine-tuning baselines. All code, data, and models are publicly released, establishing a new paradigm for trustworthy, reasoning-enabled medical AI.
📝 Abstract
Recent advances in general medical AI have made significant strides, but existing models often lack the reasoning capabilities needed for complex medical decision-making. This paper presents GMAI-VL-R1, a multimodal medical reasoning model enhanced by reinforcement learning (RL) to improve its reasoning abilities. Through iterative training, GMAI-VL-R1 optimizes decision-making, significantly boosting diagnostic accuracy and clinical support. We also develop a reasoning data synthesis method, generating step-by-step reasoning data via rejection sampling, which further enhances the model's generalization. Experimental results show that after RL training, GMAI-VL-R1 excels in tasks such as medical image diagnosis and visual question answering. While the model demonstrates basic memorization with supervised fine-tuning, RL is crucial for true generalization. Our work establishes new evaluation benchmarks and paves the way for future advancements in medical reasoning models. Code, data, and model will be released at href{https://github.com/uni-medical/GMAI-VL-R1}{this link}.