🤖 AI Summary
To address the limitations of slice-wise prompting and lack of interactive editing in 3D medical image segmentation, this paper proposes the first single-prompt-driven framework for 3D segmentation using a 2D foundation model. Methodologically, it introduces (1) noisy masks as a novel weakly supervised prompt type; (2) a slice-wise iterative inference mechanism jointly optimized with 3D consistency constraints; and (3) the first benchmark supporting cross-domain generalization and real-time interactive editing evaluation for single-prompt 3D medical segmentation. Built upon the SAM architecture, the framework fuses multimodal prompts—including noisy masks, points, and bounding boxes—to achieve high-precision 3D organ segmentation from a single prompt on the AMOS dataset, attaining an mDice of 82.7%, substantially outperforming existing methods. It further demonstrates strong cross-domain transferability and clinically viable interactive editing capabilities.
📝 Abstract
Medical image segmentation is a crucial and time-consuming task in clinical care, where mask precision is extremely important. The Segment Anything Model (SAM) offers a promising approach, as it provides an interactive interface based on visual prompting and edition to refine an initial segmentation. This model has strong generalization capabilities, does not rely on predefined classes, and adapts to diverse objects; however, it is pre-trained on natural images and lacks the ability to process medical data effectively. In addition, this model is built for 2D images, whereas a whole medical domain is based on 3D images, such as CT and MRI. Recent adaptations of SAM for medical imaging are based on 2D models, thus requiring one prompt per slice to segment 3D objects, making the segmentation process tedious. They also lack important features such as editing. To bridge this gap, we propose RadSAM, a novel method for segmenting 3D objects with a 2D model from a single prompt. In practice, we train a 2D model using noisy masks as initial prompts, in addition to bounding boxes and points. We then use this novel prompt type with an iterative inference pipeline to reconstruct the 3D mask slice-by-slice. We introduce a benchmark to evaluate the model's ability to segment 3D objects in CT images from a single prompt and evaluate the models' out-of-domain transfer and edition capabilities. We demonstrate the effectiveness of our approach against state-of-the-art models on this benchmark using the AMOS abdominal organ segmentation dataset.