🤖 AI Summary
To address the clinical bottlenecks of label scarcity and poor cross-domain generalization in ultrasound B-mode image segmentation, this paper proposes a self-supervised learning framework tailored for ultrasound imaging. The method leverages contrastive learning with learnable metrics and fine-tunes a U-Net backbone. Key contributions include: (1) the novel Relation Contrastive Loss (RCL), which models pixel-level semantic relationships to enhance fine-grained discriminability; and (2) an ultrasound-specific spatial-frequency joint augmentation strategy to improve domain-aware representation learning. Evaluated on three benchmark datasets—BUSI, BrEaST, and UDIAT—the framework achieves Dice score improvements of 3.7–9.0% using only 20%–50% labeled data, and attains up to 20.6% higher cross-domain generalization performance compared to supervised baselines. These results demonstrate substantial reduction in reliance on large-scale annotated datasets while maintaining robust segmentation accuracy across diverse ultrasound domains.
📝 Abstract
Ultrasound (US) imaging is clinically invaluable due to its noninvasive and safe nature. However, interpreting US images is challenging, requires significant expertise, and time, and is often prone to errors. Deep learning offers assistive solutions such as segmentation. Supervised methods rely on large, high-quality, and consistently labeled datasets, which are challenging to curate. Moreover, these methods tend to underperform on out-of-distribution data, limiting their clinical utility. Self-supervised learning (SSL) has emerged as a promising alternative, leveraging unlabeled data to enhance model performance and generalisability. We introduce a contrastive SSL approach tailored for B-mode US images, incorporating a novel Relation Contrastive Loss (RCL). RCL encourages learning of distinct features by differentiating positive and negative sample pairs through a learnable metric. Additionally, we propose spatial and frequency-based augmentation strategies for the representation learning on US images. Our approach significantly outperforms traditional supervised segmentation methods across three public breast US datasets, particularly in data-limited scenarios. Notable improvements on the Dice similarity metric include a 4% increase on 20% and 50% of the BUSI dataset, nearly 6% and 9% improvements on 20% and 50% of the BrEaST dataset, and 6.4% and 3.7% improvements on 20% and 50% of the UDIAT dataset, respectively. Furthermore, we demonstrate superior generalisability on the out-of-distribution UDIAT dataset with performance boosts of 20.6% and 13.6% compared to the supervised baseline using 20% and 50% of the BUSI and BrEaST training data, respectively. Our research highlights that domain-inspired SSL can improve US segmentation, especially under data-limited conditions.