🤖 AI Summary
To address the scarcity of medical data and high annotation costs in colonoscopy image polyp segmentation, this paper proposes an end-to-end framework integrating synthetic data generation with multi-model collaboration. We innovatively leverage Stable Diffusion to synthesize high-fidelity polyp images, alleviating the small-sample bottleneck. A cascaded detection-segmentation pipeline is established, combining Faster R-CNN—optimized for high-recall polyp localization (Recall: 93.08%)—with the Segment Anything Model (SAM) for precise mask generation. Systematic benchmarking across five segmentation architectures (U-Net, FPN, LinkNet, etc.) reveals LinkNet achieves the best overall performance in segmentation metrics (IoU: 64.20%; Dice: 77.53%), while FPN excels in reconstruction quality (PSNR/SSIM). The proposed framework significantly enhances model generalization and segmentation robustness under limited training data.
📝 Abstract
Colonoscopy is a vital tool for the early diagnosis of colorectal cancer, which is one of the main causes of cancer-related mortality globally; hence, it is deemed an essential technique for the prevention and early detection of colorectal cancer. The research introduces a unique multidirectional architectural framework to automate polyp detection within colonoscopy images while helping resolve limited healthcare dataset sizes and annotation complexities. The research implements a comprehensive system that delivers synthetic data generation through Stable Diffusion enhancements together with detection and segmentation algorithms. This detection approach combines Faster R-CNN for initial object localization while the Segment Anything Model (SAM) refines the segmentation masks. The faster R-CNN detection algorithm achieved a recall of 93.08% combined with a precision of 88.97% and an F1 score of 90.98%.SAM is then used to generate the image mask. The research evaluated five state-of-the-art segmentation models that included U-Net, PSPNet, FPN, LinkNet, and MANet using ResNet34 as a base model. The results demonstrate the superior performance of FPN with the highest scores of PSNR (7.205893) and SSIM (0.492381), while UNet excels in recall (84.85%) and LinkNet shows balanced performance in IoU (64.20%) and Dice score (77.53%).