🤖 AI Summary
Medical image segmentation often lacks sufficient accuracy for clinical deployment, while existing refinement methods rely on dense pixel-level annotations or extensive user interaction. This paper proposes SCORE—a lightweight weakly supervised segmentation refinement framework. SCORE introduces region-level quality scores and binary over-/under-segmentation labels as novel weak supervision signals and designs a corresponding loss function. It enables model training using only a small number of low-cost, region-level feedback queries—eliminating the need for pixel-level annotations or continuous human intervention. Evaluated on humerus CT data, SCORE significantly improves the initial segmentation accuracy of TotalSegmentator, achieving performance comparable to state-of-the-art fine-grained segmentation methods, while minimizing annotation effort and supervision requirements. Its core contribution is establishing an efficient, scalable region-level weak supervision paradigm that alleviates the strong dependency of fully supervised segmentation on high-quality, labor-intensive ground-truth annotations.
📝 Abstract
Delineating anatomical regions is a key task in medical image analysis. Manual segmentation achieves high accuracy but is labor-intensive and prone to variability, thus prompting the development of automated approaches. Recently, a breadth of foundation models has enabled automated segmentations across diverse anatomies and imaging modalities, but these may not always meet the clinical accuracy standards. While segmentation refinement strategies can improve performance, current methods depend on heavy user interactions or require fully supervised segmentations for training. Here, we present SCORE (Segmentation COrrection from Regional Evaluations), a weakly supervised framework that learns to refine mask predictions only using light feedback during training. Specifically, instead of relying on dense training image annotations, SCORE introduces a novel loss that leverages region-wise quality scores and over/under-segmentation error labels. We demonstrate SCORE on humerus CT scans, where it considerably improves initial predictions from TotalSegmentator, and achieves performance on par with existing refinement methods, while greatly reducing their supervision requirements and annotation time. Our code is available at: https://gitlab.inria.fr/adelangl/SCORE.