Identifying Surgical Instruments in Pedagogical Cataract Surgery Videos through an Optimized Aggregation Network

📅 2025-01-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of dynamically detecting surgical instruments—particularly small-sized and highly similar targets—in cataract surgery teaching videos, this paper proposes an enhanced YOLOv9-based approach. Specifically, we integrate a Programmable Gradient Information (PGI) mechanism to alleviate training bottlenecks and design a Generalized-Optimized Efficient Layer Aggregation Network (Go-ELAN) to strengthen multi-scale feature fusion. Evaluated on a custom dataset comprising 615 annotated images across 10 instrument classes, our model achieves a mAP of 73.74% at IoU=0.5—significantly outperforming original YOLOv5/v7/v8/v9 variants as well as state-of-the-art methods including Laptool and DETR. This work represents the first integration of PGI and Go-ELAN within the YOLOv9 framework, markedly improving detection robustness and accuracy under high-IoU thresholds. The proposed method provides a practical, deployable solution for fine-grained surgical instrument analysis in medical education videos.

Technology Category

Application Category

📝 Abstract
Instructional cataract surgery videos are crucial for ophthalmologists and trainees to observe surgical details repeatedly. This paper presents a deep learning model for real-time identification of surgical instruments in these videos, using a custom dataset scraped from open-access sources. Inspired by the architecture of YOLOV9, the model employs a Programmable Gradient Information (PGI) mechanism and a novel Generally-Optimized Efficient Layer Aggregation Network (Go-ELAN) to address the information bottleneck problem, enhancing Minimum Average Precision (mAP) at higher Non-Maximum Suppression Intersection over Union (NMS IoU) scores. The Go-ELAN YOLOV9 model, evaluated against YOLO v5, v7, v8, v9 vanilla, Laptool and DETR, achieves a superior mAP of 73.74 at IoU 0.5 on a dataset of 615 images with 10 instrument classes, demonstrating the effectiveness of the proposed model.
Problem

Research questions and friction points this paper is trying to address.

Cataract Surgery
Tool Detection
Medical Education
Innovation

Methods, ideas, or system contributions that make the work stand out.

Go-ELAN YOLOV9 model
adjustable information transmission mechanism (PGI)
real-time tool recognition in cataract surgery videos
🔎 Similar Papers
No similar papers found.