๐ค AI Summary
Zero-shot interactive segmentation of novel medical imaging datasets faces bottlenecks including high annotation cost and reliance on pre-trained labels or historical annotations. This paper proposes a context-aware incremental interactive segmentation paradigm: given an initial user query (e.g., click, bounding box, or scribble), it dynamically integrates previously segmented images as contextual memory and employs a Transformer-based multimodal encoder, dynamic prompt fusion, and online cache retrieval to enable continual adaptive learningโwithout domain-specific priors or pre-trained labels. Its core innovation is diminishing per-interaction annotation cost as the annotated dataset scales. Experiments demonstrate that, to achieve 90% Dice score on unseen tasks, the method reduces scribble steps by 53% and click counts by 36%, while enabling cross-modal zero-shot transfer across MRI, CT, and microscopy modalities.
๐ Abstract
Medical researchers and clinicians often need to perform novel segmentation tasks on a set of related images. Existing methods for segmenting a new dataset are either interactive, requiring substantial human effort for each image, or require an existing set of manually labeled images. We introduce a system, MultiverSeg, that enables practitioners to rapidly segment an entire new dataset without requiring access to any existing labeled data from that task or domain. Along with the image to segment, the model takes user interactions such as clicks, bounding boxes or scribbles as input, and predicts a segmentation. As the user segments more images, those images and segmentations become additional inputs to the model, providing context. As the context set of labeled images grows, the number of interactions required to segment each new image decreases. We demonstrate that MultiverSeg enables users to interactively segment new datasets efficiently, by amortizing the number of interactions per image to achieve an accurate segmentation. Compared to using a state-of-the-art interactive segmentation method, using MultiverSeg reduced the total number of scribble steps by 53% and clicks by 36% to achieve 90% Dice on sets of images from unseen tasks. We release code and model weights at https://multiverseg.csail.mit.edu