🤖 AI Summary
High annotation costs severely hinder semantic segmentation of anatomical structures in laparoscopic cholecystectomy videos. To address this, we propose an active learning–driven dataset construction framework specifically designed for surgical video data. This work is the first to deeply integrate active learning into both frame sampling and annotation workflows for surgical videos, and systematically validates deep feature distance as the optimal uncertainty metric. Our method achieves 99.4% of the full-dataset segmentation performance (mIoU = 0.4349 vs. 0.4374) using only 50% of annotated frames, substantially improving annotation efficiency and model generalizability. Key contributions include: (1) establishing a lightweight, efficient annotation paradigm tailored to surgical videos; (2) rigorously identifying deep feature distance as the most effective uncertainty estimator in active learning for this domain; and (3) providing a reproducible, generalizable technical pathway for low-resource medical image segmentation.
📝 Abstract
Labeling has always been expensive in the medical context, which has hindered related deep learning application. Our work introduces active learning in surgical video frame selection to construct a high-quality, affordable Laparoscopic Cholecystectomy dataset for semantic segmentation. Active learning allows the Deep Neural Networks (DNNs) learning pipeline to include the dataset construction workflow, which means DNNs trained by existing dataset will identify the most informative data from the newly collected data. At the same time, DNNs' performance and generalization ability improve over time when the newly selected and annotated data are included in the training data. We assessed different data informativeness measurements and found the deep features distances select the most informative data in this task. Our experiments show that with half of the data selected by active learning, the DNNs achieve almost the same performance with 0.4349 mean Intersection over Union (mIoU) compared to the same DNNs trained on the full dataset (0.4374 mIoU) on the critical anatomies and surgical instruments.