🤖 AI Summary
This work addresses the challenge of modality missingness in multimodal medical image segmentation, which often leads to inconsistent predictions from expert models and particularly undermines the segmentation stability of small-scale foreground structures. To this end, the authors propose CLoE, a consistency-driven framework that innovatively formulates robustness as decision-level consistency control among experts. CLoE employs a dual-branch architecture to jointly optimize global and local consistency and incorporates a lightweight gating network for reliability-aware feature recalibration. The method maintains high segmentation performance under missing modalities while remaining competitive when all modalities are present. Experimental results demonstrate that CLoE outperforms existing approaches on the BraTS 2020 and MSD Prostate datasets, exhibiting superior cross-dataset generalization and robust segmentation of clinically critical structures.
📝 Abstract
Multimodal medical image segmentation often faces missing modalities at inference, which induces disagreement among modality experts and makes fusion unstable, particularly on small foreground structures. We propose Consistency Learning of Experts (CLoE), a consistency-driven framework for missing-modality segmentation that preserves strong performance when all modalities are available. CLoE formulates robustness as decision-level expert consistency control and introduces a dual-branch Expert Consistency Learning objective. Modality Expert Consistency enforces global agreement among expert predictions to reduce case-wise drift under partial inputs, while Region Expert Consistency emphasizes agreement on clinically critical foreground regions to avoid background-dominated regularization. We further map consistency scores to modality reliability weights using a lightweight gating network, enabling reliability-aware feature recalibration before fusion. Extensive experiments on BraTS 2020 and MSD Prostate demonstrate that CLoE outperforms state-of-the-art methods in incomplete multimodal segmentation, while exhibiting strong cross-dataset generalization and improving robustness on clinically critical structures.