MaskSAM: Towards Auto-prompt SAM with Mask Classification for Volumetric Medical Image Segmentation

📅 2024-03-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address SAM’s inability to produce semantic labels, its reliance on manual prompts, and its limited performance in 3D medical image segmentation, this work proposes the first fully automatic 3D medical image segmentation framework. Methodologically, it introduces: (1) a mask classification–driven automatic prompt generation mechanism that eliminates manual intervention for prompt initialization; (2) a fusion strategy integrating global and auxiliary classifier tokens to enhance semantic consistency; and (3) a lightweight 3D deep convolutional/MLP adapter for efficient fine-tuning of the SAM backbone. The framework comprises a prompt generator, a dual-path 3D adapter, and a mask classification decoder. Evaluated on AMOS2022, it achieves 90.52% Dice score—surpassing nnUNet by 2.7%—and yields improvements of 1.7% and 1.0% on ACDC and Synapse, respectively. This advances SAM toward clinically deployable, fully automatic semantic segmentation.

Technology Category

Application Category

📝 Abstract
Segment Anything Model (SAM), a prompt-driven foundation model for natural image segmentation, has demonstrated impressive zero-shot performance. However, SAM does not work when directly applied to medical image segmentation, since SAM lacks the ability to predict semantic labels, requires additional prompts, and presents suboptimal performance. Following the above issues, we propose MaskSAM, a novel mask classification prompt-free SAM adaptation framework for medical image segmentation. We design a prompt generator combined with the image encoder in SAM to generate a set of auxiliary classifier tokens, auxiliary binary masks, and auxiliary bounding boxes. Each pair of auxiliary mask and box prompts can solve the requirements of extra prompts. The semantic label prediction can be addressed by the sum of the auxiliary classifier tokens and the learnable global classifier tokens in the mask decoder of SAM. Meanwhile, we design a 3D depth-convolution adapter for image embeddings and a 3D depth-MLP adapter for prompt embeddings to efficiently fine-tune SAM. Our method achieves state-of-the-art performance on AMOS2022, 90.52% Dice, which improved by 2.7% compared to nnUNet. Our method surpasses nnUNet by 1.7% on ACDC and 1.0% on Synapse datasets.
Problem

Research questions and friction points this paper is trying to address.

Adapting SAM for medical image segmentation without prompts
Enhancing SAM to predict semantic labels automatically
Improving 3D medical image segmentation performance over nnUNet
Innovation

Methods, ideas, or system contributions that make the work stand out.

Auto-prompt SAM with mask classification
3D depth-convolution and MLP adapters
Prompt generator for auxiliary tokens and masks
🔎 Similar Papers
No similar papers found.
Bin Xie
Bin Xie
InfoBeyond Technology LLC
Mobile ComuptingSecurityBig Data Streaming
H
Hao Tang
Robotics Institute, Carnegie Mellon University, USA
B
Bin Duan
Department of Computer Science, Illinois Institute of Technology, USA
Dawen Cai
Dawen Cai
University of Michigan
Neurosciencesingle molecule biophysicsmicroscopymultiplex profiling
Y
Yan Yan
Department of Computer Science, Illinois Institute of Technology, USA
Gady Agam
Gady Agam
Unknown affiliation