🤖 AI Summary
Medical image segmentation suffers from severe scarcity of annotated data—particularly for polyp detection, which demands expert domain knowledge. To address this, we propose a text-guided latent-space diffusion framework that synthesizes clinically realistic polyp images in a single step within the latent space, leveraging text-conditioned latent variable estimation and direct latent modeling. This approach avoids distribution shift while preserving both generation diversity and inference efficiency. Our method integrates latent diffusion models, text-guided inpainting, and an end-to-end segmentation network for effective data augmentation. Evaluated on CVC-ClinicDB, it achieves 96.0% Dice and 92.9% IoU scores, with inference speed accelerated by a factor of T, enabling real-time deployment in resource-constrained clinical settings. The core innovations lie in text-driven single-step latent synthesis and an unbiased latent variable estimation mechanism.
📝 Abstract
Medical image segmentation suffers from data scarcity, particularly in polyp detection where annotation requires specialized expertise. We present SynDiff, a framework combining text-guided synthetic data generation with efficient diffusion-based segmentation. Our approach employs latent diffusion models to generate clinically realistic synthetic polyps through text-conditioned inpainting, augmenting limited training data with semantically diverse samples. Unlike traditional diffusion methods requiring iterative denoising, we introduce direct latent estimation enabling single-step inference with T x computational speedup. On CVC-ClinicDB, SynDiff achieves 96.0% Dice and 92.9% IoU while maintaining real-time capability suitable for clinical deployment. The framework demonstrates that controlled synthetic augmentation improves segmentation robustness without distribution shift. SynDiff bridges the gap between data-hungry deep learning models and clinical constraints, offering an efficient solution for deployment in resourcelimited medical settings.