🤖 AI Summary
This work addresses the domain gap between synthetic and real medical images in semantic feature space, which hinders the effective utilization of synthetic data in semi-supervised segmentation. To mitigate this, the authors propose a feature-semantic alignment framework that explicitly aligns features of synthetic images with those of real images by leveraging similarity-based alignment loss computed via frozen DINOv2 embeddings. The approach integrates a soft boundary blending strategy, consistency-enforced pseudo-labels generated by an EMA teacher model, and a soft segmentation loss to enable collaborative training on both synthetic and real data. Using only 10% labeled real data and 90% unlabeled synthetic data, the method achieves Dice scores of 89.34% on ACDC and 84.42% on FIVES, matching the performance of state-of-the-art methods that rely on unlabeled real data.
📝 Abstract
Synthetic data, an appealing alternative to extensive expert-annotated data for medical image segmentation, consistently fails to improve segmentation performance despite its visual realism. The reason being that synthetic and real medical images exist in different semantic feature spaces, creating a domain gap that current semi-supervised learning methods cannot bridge. We propose SRA-Seg, a framework explicitly designed to align synthetic and real feature distributions for medical image segmentation. SRA-Seg introduces a similarity-alignment (SA) loss using frozen DINOv2 embeddings to pull synthetic representations toward their nearest real counterparts in semantic space. We employ soft edge blending to create smooth anatomical transitions and continuous labels, eliminating the hard boundaries from traditional copy-paste augmentation. The framework generates pseudo-labels for synthetic images via an EMA teacher model and applies soft-segmentation losses that respect uncertainty in mixed regions. Our experiments demonstrate strong results: using only 10% labeled real data and 90% synthetic unlabeled data, SRA-Seg achieves 89.34% Dice on ACDC and 84.42% on FIVES, significantly outperforming existing semi-supervised methods and matching the performance of methods using real unlabeled data.