AI-Driven MRI-based Brain Tumour Segmentation Benchmarking

πŸ“… 2025-06-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current promptable medical image segmentation models lack systematic evaluation across varying prompt quality levels. Method: We conduct the first zero-shot benchmark on the BraTS 2023 multimodal MRI dataset, evaluating SAM, SAM 2, MedSAM, SAM-Med-3D, and nnU-Net. We compare point prompts against high-precision bounding box prompts and introduce pediatric tumor data for fine-tuning to enhance point-prompt performance. Contribution/Results: Under bounding box prompts, SAM and SAM 2 achieve Dice scores of 0.894 and 0.893β€”surpassing nnU-Net. Fine-tuning on pediatric oncology data significantly improves point-prompt accuracy. Our analysis demonstrates that prompt quality critically governs model performance, affirming the viability of general-purpose vision foundation models in medical segmentation. This work provides empirical evidence and methodological guidance for prompt engineering in clinical imaging analysis.

Technology Category

Application Category

πŸ“ Abstract
Medical image segmentation has greatly aided medical diagnosis, with U-Net based architectures and nnU-Net providing state-of-the-art performance. There have been numerous general promptable models and medical variations introduced in recent years, but there is currently a lack of evaluation and comparison of these models across a variety of prompt qualities on a common medical dataset. This research uses Segment Anything Model (SAM), Segment Anything Model 2 (SAM 2), MedSAM, SAM-Med-3D, and nnU-Net to obtain zero-shot inference on the BraTS 2023 adult glioma and pediatrics dataset across multiple prompt qualities for both points and bounding boxes. Several of these models exhibit promising Dice scores, particularly SAM and SAM 2 achieving scores of up to 0.894 and 0.893, respectively when given extremely accurate bounding box prompts which exceeds nnU-Net's segmentation performance. However, nnU-Net remains the dominant medical image segmentation network due to the impracticality of providing highly accurate prompts to the models. The model and prompt evaluation, as well as the comparison, are extended through fine-tuning SAM, SAM 2, MedSAM, and SAM-Med-3D on the pediatrics dataset. The improvements in point prompt performance after fine-tuning are substantial and show promise for future investigation, but are unable to achieve better segmentation than bounding boxes or nnU-Net.
Problem

Research questions and friction points this paper is trying to address.

Evaluates AI models for MRI brain tumor segmentation
Compares prompt-based models across different prompt qualities
Assesses fine-tuning impact on segmentation performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarks SAM models on BraTS 2023 dataset
Evaluates zero-shot inference with varied prompts
Fine-tunes models for pediatric data improvement
πŸ”Ž Similar Papers
No similar papers found.
C
Connor Ludwig
Engineering Science, University of Toronto, Toronto, Canada
K
Khashayar Namdar
Institute of Medical Science, University of Toronto, Toronto, Canada
Farzad Khalvati
Farzad Khalvati
AI Chair in Medical Imaging, The Hospital for Sick Children, University of Toronto
Precision MedicineMachine LearningBiomedical EngineeringComputer Science