Generalist Models in Medical Image Segmentation: A Survey and Performance Comparison with Task-Specific Approaches

📅 2025-06-12

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study systematically investigates the applicability and clinical deployment bottlenecks of foundation models—particularly Segment Anything Model (SAM) and its variants—in medical image segmentation. Addressing critical challenges including annotation scarcity, poor cross-modal generalization, and regulatory/compliance constraints alongside trustworthy AI requirements, we propose the first taxonomy for medical imaging foundation models and establish a multi-dimensional evaluation framework tailored for clinical translation. Methodologically, we integrate medical-domain prompt engineering, multimodal alignment, adapter-based fine-tuning, and trustworthy AI assessment, conducting unified benchmarking across multiple public datasets. Results demonstrate that foundation models achieve near-state-of-the-art performance in zero-shot and few-shot settings, yet exhibit significant limitations in segmenting small lesions and low-contrast abnormalities. Furthermore, we identify fundamental bottlenecks in cross-modal generalization and identify synthetic data generation, early-layer multimodal feature fusion, and embodied AI as key directions for future advancement.

Technology Category

Application Category

📝 Abstract

Following the successful paradigm shift of large language models, leveraging pre-training on a massive corpus of data and fine-tuning on different downstream tasks, generalist models have made their foray into computer vision. The introduction of Segment Anything Model (SAM) set a milestone on segmentation of natural images, inspiring the design of a multitude of architectures for medical image segmentation. In this survey we offer a comprehensive and in-depth investigation on generalist models for medical image segmentation. We start with an introduction on the fundamentals concepts underpinning their development. Then, we provide a taxonomy on the different declinations of SAM in terms of zero-shot, few-shot, fine-tuning, adapters, on the recent SAM 2, on other innovative models trained on images alone, and others trained on both text and images. We thoroughly analyze their performances at the level of both primary research and best-in-literature, followed by a rigorous comparison with the state-of-the-art task-specific models. We emphasize the need to address challenges in terms of compliance with regulatory frameworks, privacy and security laws, budget, and trustworthy artificial intelligence (AI). Finally, we share our perspective on future directions concerning synthetic data, early fusion, lessons learnt from generalist models in natural language processing, agentic AI and physical AI, and clinical translation.

Problem

Research questions and friction points this paper is trying to address.

Compare generalist and task-specific models for medical image segmentation

Analyze SAM variants and other innovative models in medical imaging

Address challenges in regulatory compliance, privacy, and trustworthy AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging pre-training and fine-tuning techniques

Introducing Segment Anything Model for segmentation

Addressing regulatory and privacy challenges in AI

🔎 Similar Papers

No similar papers found.

Authors to Follow