🤖 AI Summary
Medical image segmentation faces three key challenges: high inter-observer variability in expert annotations, scarcity of labeled data, and severe class imbalance—leading to poor uncertainty calibration and anatomically implausible predictions. To address these, we propose a statistical distance-enhanced Probabilistic UNet. Our method is the first to explicitly incorporate Wasserstein and Jensen–Shannon divergences into the conditional decoder training, enabling direct modeling of distributional discrepancies across multi-expert annotations. It jointly leverages probabilistic multi-expert label modeling and few-shot regularization to improve both uncertainty quantification and calibration. Evaluated on intracranial vessel and multiple sclerosis lesion segmentation, our approach significantly outperforms four baselines (p < 0.05), yields superior 3D structural integrity, markedly enhances anatomical plausibility, and supports downstream clinical applications—including multi-label segmentation and hemodynamic modeling.
📝 Abstract
In the domain of medical imaging, many supervised learning based methods for segmentation face several challenges such as high variability in annotations from multiple experts, paucity of labelled data and class imbalanced datasets. These issues may result in segmentations that lack the requisite precision for clinical analysis and can be misleadingly overconfident without associated uncertainty quantification. We propose the PULASki for biomedical image segmentation that accurately captures variability in expert annotations, even in small datasets. Our approach makes use of an improved loss function based on statistical distances in a conditional variational autoencoder structure (Probabilistic UNet), which improves learning of the conditional decoder compared to the standard cross-entropy particularly in class imbalanced problems. We analyse our method for two structurally different segmentation tasks (intracranial vessel and multiple sclerosis (MS) lesion) and compare our results to four well-established baselines in terms of quantitative metrics and qualitative output. Empirical results demonstrate the PULASKi method outperforms all baselines at the 5% significance level. The generated segmentations are shown to be much more anatomically plausible than in the 2D case, particularly for the vessel task. Our method can also be applied to a wide range of multi-label segmentation tasks and and is useful for downstream tasks such as hemodynamic modelling (computational fluid dynamics and data assimilation), clinical decision making, and treatment planning.