🤖 AI Summary
This work addresses the high annotation cost and scarcity of labeled data in PET/CT image segmentation by proposing MuDuo, a novel framework that introduces dual-modality foundation models into semi-supervised segmentation for the first time. Leveraging SAM-Med3D (for CT) and SegAnyPET (for PET) as teacher models, MuDuo employs a prompt-free mutual distillation mechanism combined with semi-supervised learning and multimodal alignment strategies to effectively fuse structural and metabolic information into a lightweight student network. Evaluated on the AutoPET dataset using only five annotated cases, MuDuo achieves state-of-the-art performance, significantly improving segmentation accuracy while drastically reducing reliance on labeled data.
📝 Abstract
Organ segmentation from PET/CT is critical for quantitative analysis and radiotherapy planning in oncology. To ease the high annotation cost of PET/CT segmentation, semi-supervised learning (SSL) provides a practical and effective solution for developing deep models with limited labeled data. Recent developments in visual foundation models have demonstrated remarkable adaptability with improved efficiency. In this work, we propose a mutual distillation framework that seamlessly exploits both structural and functional foundation models, which act as modality-specific generalists for distilling knowledge from structural CT and metabolic PET imaging. By bridging the gap between the task-specific precision of student models and the segmentation priors of generalist foundation models, we propose \textbf{MuDuo}, a mutual distillation framework that synergistically leverages SAM-Med3D for CT and SegAnyPET for PET to distill their knowledge into a lightweight student network. Our approach eliminates the need for manual prompts while maximizing the utility of unlabeled data for automatic segmentation, achieving state-of-the-art performance on the AutoPET dataset with only 5 labeled cases. Our source code is available at https://github.com/Wu-beining/MuDuo.