MedGemma Technical Report

📅 2025-07-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical AI faces challenges including the scarcity of foundational models, strong data heterogeneity, and stringent privacy constraints, making it difficult for existing models to balance general-purpose capability with clinical domain expertise. To address this, we propose MedGemma—a family of vision-language foundation models built upon the Gemma 3 architecture (4B/27B), incorporating MedSigLIP, a novel vision encoder specifically designed for medical image understanding, enabling end-to-end multimodal joint modeling. MedGemma achieves efficient adaptation with minimal labeled data and demonstrates superior cross-distribution generalization on clinical tasks. Experiments show that it outperforms comparably sized generative models on medical multimodal question answering and chest X-ray classification—approaching the performance of task-specific models. After fine-tuning, it reduces electronic health record retrieval error rate by 50% and achieves state-of-the-art performance in pneumothorax detection and histopathological classification. MedGemma is the first model to unify general-purpose large-model capabilities with clinical-grade medical understanding.

Technology Category

Application Category

📝 Abstract
Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B. MedGemma demonstrates advanced medical understanding and reasoning on images and text, significantly exceeding the performance of similar-sized generative models and approaching the performance of task-specific models, while maintaining the general capabilities of the Gemma 3 base models. For out-of-distribution tasks, MedGemma achieves 2.6-10% improvement on medical multimodal question answering, 15.5-18.1% improvement on chest X-ray finding classification, and 10.8% improvement on agentic evaluations compared to the base models. Fine-tuning MedGemma further improves performance in subdomains, reducing errors in electronic health record information retrieval by 50% and reaching comparable performance to existing specialized state-of-the-art methods for pneumothorax classification and histopathology patch classification. We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP. MedSigLIP powers the visual understanding capabilities of MedGemma and as an encoder achieves comparable or better performance than specialized medical image encoders. Taken together, the MedGemma collection provides a strong foundation of medical image and text capabilities, with potential to significantly accelerate medical research and development of downstream applications. The MedGemma collection, including tutorials and model weights, can be found at https://goo.gle/medgemma.
Problem

Research questions and friction points this paper is trying to address.

Addressing healthcare AI challenges with diverse data and privacy needs
Developing foundation models for medical tasks with less tuning data
Enhancing medical vision-language understanding and reasoning capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

MedGemma: medical vision-language foundation models
MedSigLIP: medically-tuned vision encoder
Fine-tuning reduces errors by 50%
🔎 Similar Papers
No similar papers found.
Andrew Sellergren
Andrew Sellergren
Software Engineer
computer visionmedical imagingmachine learningartificial intelligence
S
Sahar Kazemzadeh
Google Research and Google DeepMind
T
Tiam Jaroensri
Google Research and Google DeepMind
A
Atilla Kiraly
Google Research and Google DeepMind
M
Madeleine Traverse
Google Research and Google DeepMind
T
Timo Kohlberger
Google Research and Google DeepMind
Shawn Xu
Shawn Xu
Google LLC
Machine LearningComputer VisionArtificial Intelligence
F
Fayaz Jamil
Google Research and Google DeepMind
C
Cían Hughes
Google Research and Google DeepMind
C
Charles Lau
Google Research and Google DeepMind
J
Justin Chen
Google Research and Google DeepMind
F
Fereshteh Mahvar
Google Research and Google DeepMind
L
Liron Yatziv
Google Research and Google DeepMind
T
Tiffany Chen
Google Research and Google DeepMind
B
Bram Sterling
Google Research and Google DeepMind
S
Stefanie Anna Baby
Google Research and Google DeepMind
S
Susanna Maria Baby
Google Research and Google DeepMind
J
Jeremy Lai
Google Research and Google DeepMind
Samuel Schmidgall
Samuel Schmidgall
Google DeepMind
AI AgentsLLM agentsLarge Language ModelsMedical AI
L
Lu Yang
Google Research and Google DeepMind
Kejia Chen
Kejia Chen
Technical University of Munich
Manipulation of Deformable ObjectsMulti-robot CollaborationLLM-based Planning
P
Per Bjornsson
Google Research and Google DeepMind
Shashir Reddy
Shashir Reddy
Engineer, Google, Inc.
R
Ryan Brush
Google Research and Google DeepMind
K
Kenneth Philbrick
Google Research and Google DeepMind