🤖 AI Summary
To address performance degradation of the Segment Anything Model (SAM) in medical image segmentation—caused by modality discrepancies and reliance on manual prompts—this paper proposes a fully automatic adaptation framework. First, we introduce a multi-scale auxiliary mask-driven self-prompting mechanism, incorporating Distance Transform-based center-point sampling, to eliminate dependence on hand-crafted point or box prompts. Second, we propose a 3D Depth-Fused Adapter (DfusedAdapter), enabling end-to-end zero-shot transfer of the 2D pre-trained SAM to 3D medical images. The adapter is seamlessly integrated as a plug-in between SAM’s image encoder and mask decoder. Evaluated on AMOS2022, ACDC, and Synapse benchmarks, our method achieves Dice scores surpassing nnUNet by 2.3%, 1.6%, and 0.5%, respectively, establishing new state-of-the-art performance in medical image segmentation.
📝 Abstract
Segment Anything Model (SAM) has demonstrated impressive zero-shot performance and brought a range of unexplored capabilities to natural image segmentation tasks. However, as a very important branch of image segmentation, the performance of SAM remains uncertain when applied to medical image segmentation due to the significant differences between natural images and medical images. Meanwhile, it is harsh to meet the SAM's requirements of extra prompts provided, such as points or boxes to specify medical regions. In this paper, we propose a novel self-prompt SAM adaptation framework for medical image segmentation, named Self-Prompt-SAM. We design a multi-scale prompt generator combined with the image encoder in SAM to generate auxiliary masks. Then, we use the auxiliary masks to generate bounding boxes as box prompts and use Distance Transform to select the most central points as point prompts. Meanwhile, we design a 3D depth-fused adapter (DfusedAdapter) and inject the DFusedAdapter into each transformer in the image encoder and mask decoder to enable pre-trained 2D SAM models to extract 3D information and adapt to 3D medical images. Extensive experiments demonstrate that our method achieves state-of-the-art performance and outperforms nnUNet by 2.3% on AMOS2022, 1.6% on ACDCand 0.5% on Synapse datasets.