🤖 AI Summary
In medical image segmentation, expert annotations exhibit substantial inter-observer variability (e.g., lung nodule delineation); existing methods either enforce consensus or assign separate branches per expert, lacking controllable personalization. This paper introduces the first natural language prompt-driven two-stage personalized segmentation framework. It employs a probabilistic U-Net to generate diverse segmentation hypotheses, incorporates multi-level contrastive learning to align textual prompts with visual representations, and designs a prompt-guided latent-space projection mechanism to achieve disentangled, interpretable modeling of expert-specific styles. Evaluated on LIDC-IDRI and prostate MRI datasets, our method reduces the Generalized Energy Distance by 17% and improves mean Dice score by over 1.0 points compared to DPersona. It is the first to enable fine-grained, language-driven, style-controllable segmentation.
📝 Abstract
Automated medical image segmentation suffers from high inter-observer variability, particularly in tasks such as lung nodule delineation, where experts often disagree. Existing approaches either collapse this variability into a consensus mask or rely on separate model branches for each annotator. We introduce ProSona, a two-stage framework that learns a continuous latent space of annotation styles, enabling controllable personalization via natural language prompts. A probabilistic U-Net backbone captures diverse expert hypotheses, while a prompt-guided projection mechanism navigates this latent space to generate personalized segmentations. A multi-level contrastive objective aligns textual and visual representations, promoting disentangled and interpretable expert styles. Across the LIDC-IDRI lung nodule and multi-institutional prostate MRI datasets, ProSona reduces the Generalized Energy Distance by 17% and improves mean Dice by more than one point compared with DPersona. These results demonstrate that natural-language prompts can provide flexible, accurate, and interpretable control over personalized medical image segmentation. Our implementation is available online 1 .