๐ค AI Summary
To address the weak generalization of surgical instrument presence detection models in laparoscopic videos due to scarce annotated data, this paper proposes a two-stage adaptive fine-tuning method. First, linear probing rapidly assesses feature transferability; second, a progressive freezing mechanism dynamically freezes lower-layer parameters while fine-tuning only higher-level layersโenabling efficient domain adaptation in a single training pass. This approach significantly reduces computational overhead and mitigates overfitting, enhancing few-shot domain adaptation efficiency. Evaluated on the Cholec80 dataset using ImageNet-pretrained ResNet-50 and DenseNet-121, our method achieves 96.4% mAP, surpassing state-of-the-art approaches. Cross-modal generalization is further validated on the CATARACTS dataset, demonstrating robustness and broad applicability across diverse laparoscopic video domains.
๐ Abstract
Minimally invasive surgery can benefit significantly from automated surgical tool detection, enabling advanced analysis and assistance. However, the limited availability of annotated data in surgical settings poses a challenge for training robust deep learning models. This paper introduces a novel staged adaptive fine-tuning approach consisting of two steps: a linear probing stage to condition additional classification layers on a pre-trained CNN-based architecture and a gradual freezing stage to dynamically reduce the fine-tunable layers, aiming to regulate adaptation to the surgical domain. This strategy reduces network complexity and improves efficiency, requiring only a single training loop and eliminating the need for multiple iterations. We validated our method on the Cholec80 dataset, employing CNN architectures (ResNet-50 and DenseNet-121) pre-trained on ImageNet for detecting surgical tools in cholecystectomy endoscopic videos. Our results demonstrate that our method improves detection performance compared to existing approaches and established fine-tuning techniques, achieving a mean average precision (mAP) of 96.4%. To assess its broader applicability, the generalizability of the fine-tuning strategy was further confirmed on the CATARACTS dataset, a distinct domain of minimally invasive ophthalmic surgery. These findings suggest that gradual freezing fine-tuning is a promising technique for improving tool presence detection in diverse surgical procedures and may have broader applications in general image classification tasks.