PolypDB: A Curated Multi-Center Dataset for Development of AI Algorithms in Colonoscopy

📅 2024-08-19
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Colonoscopy polyp miss-rate contributes significantly to colorectal cancer risk. To address the scarcity of high-quality, open-source data for AI-driven polyp detection and segmentation, this work introduces the first publicly available, multicenter, multimodal colonoscopy polyp image dataset—comprising 3,934 images with both pixel-level (segmentation) and bounding-box (detection) annotations. It spans three clinical centers in Norway, Sweden, and Vietnam, and covers five imaging modalities: BLI, FICE, LCI, NBI, and WLI. We establish the first unified, cross-site, cross-device, and cross-modality annotation protocol and propose a three-dimensional evaluation framework—across modality, clinical center, and federated learning settings. The dataset is openly released on OSF and GitHub, accompanied by standardized train/val/test splits, benchmark implementations (nnUNet for segmentation, YOLOv8 for detection), and baseline performance metrics. Experiments demonstrate substantial improvements in model generalizability and robustness across unseen centers, effectively bridging a critical gap in open, clinically diverse polyp AI research resources.

Technology Category

Application Category

📝 Abstract
Colonoscopy is the primary method for examination, detection, and removal of polyps. However, challenges such as variations among the endoscopists' skills, bowel quality preparation, and the complex nature of the large intestine contribute to high polyp miss-rate. These missed polyps can develop into cancer later, underscoring the importance of improving the detection methods. To address this gap of lack of publicly available, multi-center large and diverse datasets for developing automatic methods for polyp detection and segmentation, we introduce PolypDB, a large scale publicly available dataset that contains 3934 still polyp images and their corresponding ground truth from real colonoscopy videos. PolypDB comprises images from five modalities: Blue Light Imaging (BLI), Flexible Imaging Color Enhancement (FICE), Linked Color Imaging (LCI), Narrow Band Imaging (NBI), and White Light Imaging (WLI) from three medical centers in Norway, Sweden, and Vietnam. We provide a benchmark on each modality and center, including federated learning settings using popular segmentation and detection benchmarks. PolypDB is public and can be downloaded at url{https://osf.io/pr7ms/}. More information about the dataset, segmentation, detection, federated learning benchmark and train-test split can be found at url{https://github.com/DebeshJha/PolypDB}.
Problem

Research questions and friction points this paper is trying to address.

Colonoscopy
Polyp Detection
Cancer Risk Reduction
Innovation

Methods, ideas, or system contributions that make the work stand out.

PolypDB
Colonoscopy
Imaging Techniques
🔎 Similar Papers
No similar papers found.
Debesh Jha
Debesh Jha
University of South Dakota
Deep LearningBiomedical InformaticsMedical Image computingComputer visionAI for Medicine
Nikhil Kumar Tomar
Nikhil Kumar Tomar
Indira Gandhi National Open University, India
Artificial IntelligenceComputer VisionImage ClassificationImage Segmentation
Vanshali Sharma
Vanshali Sharma
Northwestern University
Medical Image AnalysisComputer VisionDeep Learning
Quoc-Huy Trinh
Quoc-Huy Trinh
Aalto University
Deep LearningComputer VisionDeep Generative ModelMedical Image Analysis
K
Koushik Biswas
Machine & Hybrid Intelligence Lab, Department of Radiology, Northwestern University, Chicago, USA
Hongyi Pan
Hongyi Pan
Northwestern University
Signal ProcessingMachine LearningImage ProcessingFederated Learning
R
Ritika K. Jha
Machine & Hybrid Intelligence Lab, Department of Radiology, Northwestern University, Chicago, USA
Gorkem Durak
Gorkem Durak
Northwestern University, Department of Radiology
radiologyartificial intelligence
A
Alexander Hann
J
Jonas Varkey
H
Hang Viet Dao
L
Long Van Dao
B
Binh Phuc Nguyen
K
Khanh Cong Pham
Q
Quang Trung Tran
N
Nikolaos Papachrysos
B
Brandon Rieders
P
Peter Thelin Schmidt
E
Enrik Geissler
T
Tyler Berzin
P
Paal Halvorsen
M
Michael A. Riegler
Thomas de Lange
Thomas de Lange
Dept of Med. Sahlgrenska Univ. Hosp.,Sahlgrenska Academy GU, Augere Medical
endoscopy educationendoscopyartificial inteligencecolorectal cancer screening
Ulas Bagci
Ulas Bagci
Northwestern University
artificial intelligencedeep learningbiomedical image analysismedical image computing