MaskSAM: Towards Auto-prompt SAM with Mask Classification for Volumetric Medical Image Segmentation

📅 2024-03-21

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

To address SAM’s inability to produce semantic labels, its reliance on manual prompts, and its limited performance in 3D medical image segmentation, this work proposes the first fully automatic 3D medical image segmentation framework. Methodologically, it introduces: (1) a mask classification–driven automatic prompt generation mechanism that eliminates manual intervention for prompt initialization; (2) a fusion strategy integrating global and auxiliary classifier tokens to enhance semantic consistency; and (3) a lightweight 3D deep convolutional/MLP adapter for efficient fine-tuning of the SAM backbone. The framework comprises a prompt generator, a dual-path 3D adapter, and a mask classification decoder. Evaluated on AMOS2022, it achieves 90.52% Dice score—surpassing nnUNet by 2.7%—and yields improvements of 1.7% and 1.0% on ACDC and Synapse, respectively. This advances SAM toward clinically deployable, fully automatic semantic segmentation.

Technology Category

Application Category

📝 Abstract

Segment Anything Model (SAM), a prompt-driven foundation model for natural image segmentation, has demonstrated impressive zero-shot performance. However, SAM does not work when directly applied to medical image segmentation, since SAM lacks the ability to predict semantic labels, requires additional prompts, and presents suboptimal performance. Following the above issues, we propose MaskSAM, a novel mask classification prompt-free SAM adaptation framework for medical image segmentation. We design a prompt generator combined with the image encoder in SAM to generate a set of auxiliary classifier tokens, auxiliary binary masks, and auxiliary bounding boxes. Each pair of auxiliary mask and box prompts can solve the requirements of extra prompts. The semantic label prediction can be addressed by the sum of the auxiliary classifier tokens and the learnable global classifier tokens in the mask decoder of SAM. Meanwhile, we design a 3D depth-convolution adapter for image embeddings and a 3D depth-MLP adapter for prompt embeddings to efficiently fine-tune SAM. Our method achieves state-of-the-art performance on AMOS2022, 90.52% Dice, which improved by 2.7% compared to nnUNet. Our method surpasses nnUNet by 1.7% on ACDC and 1.0% on Synapse datasets.

Problem

Research questions and friction points this paper is trying to address.

Adapting SAM for medical image segmentation without prompts

Enhancing SAM to predict semantic labels automatically

Improving 3D medical image segmentation performance over nnUNet

Innovation

Methods, ideas, or system contributions that make the work stand out.

Auto-prompt SAM with mask classification

3D depth-convolution and MLP adapters

Prompt generator for auxiliary tokens and masks

🔎 Similar Papers

No similar papers found.