Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge

📅 2025-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Robust instrument recognition and localization in minimally invasive surgical endoscopic videos remains challenging under real-world conditions due to complex backgrounds, occlusions, and inter-center variability. Method: We introduce the first unified, multi-center cholecystectomy video dataset with synchronized annotations for surgical phase classification, instrument instance segmentation, and anatomical keypoint localization—enabling contextual and temporal modeling. We propose a temporal-aware multi-task learning framework that jointly optimizes all three tasks while explicitly incorporating surgical workflow priors. Evaluation strictly follows the BIAS guidelines to establish a high-quality benchmark for robot-assisted surgery. Contribution/Results: Our method significantly improves instrument localization accuracy under cluttered backgrounds and enhances cross-center generalization. It also advances intraoperative scene understanding by improving interpretability and clinical utility, setting a new standard for vision-based surgical intelligence.

Technology Category

Application Category

📝 Abstract
Reliable recognition and localization of surgical instruments in endoscopic video recordings are foundational for a wide range of applications in computer- and robot-assisted minimally invasive surgery (RAMIS), including surgical training, skill assessment, and autonomous assistance. However, robust performance under real-world conditions remains a significant challenge. Incorporating surgical context - such as the current procedural phase - has emerged as a promising strategy to improve robustness and interpretability. To address these challenges, we organized the Surgical Procedure Phase, Keypoint, and Instrument Recognition (PhaKIR) sub-challenge as part of the Endoscopic Vision (EndoVis) challenge at MICCAI 2024. We introduced a novel, multi-center dataset comprising thirteen full-length laparoscopic cholecystectomy videos collected from three distinct medical institutions, with unified annotations for three interrelated tasks: surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation. Unlike existing datasets, ours enables joint investigation of instrument localization and procedural context within the same data while supporting the integration of temporal information across entire procedures. We report results and findings in accordance with the BIAS guidelines for biomedical image analysis challenges. The PhaKIR sub-challenge advances the field by providing a unique benchmark for developing temporally aware, context-driven methods in RAMIS and offers a high-quality resource to support future research in surgical scene understanding.
Problem

Research questions and friction points this paper is trying to address.

Enhancing surgical instrument recognition and localization in endoscopy
Improving robustness using surgical phase context in RAMIS
Providing a multi-task dataset for joint surgical analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-center dataset with unified annotations
Joint investigation of instrument localization and context
Temporally aware, context-driven methods benchmark
🔎 Similar Papers
No similar papers found.
Tobias Rueckert
Tobias Rueckert
PhD Student, OTH Regensburg & TU Munich
Medical Image ComputingDeep LearningSurgical Instrument Recognition
David Rauber
David Rauber
PhD Student, Ostbayerische Technische Hochschule
machine learningmedical image computing
R
Raphaela Maerkl
Regensburg Medical Image Computing (ReMIC), OTH Regensburg, Regensburg, Germany
Leonard Klausmann
Leonard Klausmann
OTH Regensburg
S
Suemeyye R. Yildiran
Regensburg Medical Image Computing (ReMIC), OTH Regensburg, Regensburg, Germany; Regensburg Center of Biomedical Engineering (RCBE), OTH Regensburg and Regensburg University, Regensburg, Germany; Regensburg Center of Health Sciences and Technology (RCHST), OTH Regensburg, Regensburg, Germany
M
Max Gutbrod
Regensburg Medical Image Computing (ReMIC), OTH Regensburg, Regensburg, Germany
Danilo Weber Nunes
Danilo Weber Nunes
OTH Regensburg
Machine Learning
A
Alvaro Fernandez Moreno
AI Centre of Excellence, Medtronic Ltd., Watford, UK; Engineering Sciences, University College London, London, UK
Imanol Luengo
Imanol Luengo
Medtronic - R&D Computer Vision
Computer VisionSurgical VisionSurgical AIComputer Assisted Interventions
Danail Stoyanov
Danail Stoyanov
Professor of Robot Vision, University College London
Surgical VisionSurgical AISurgical RoboticsComputer Assisted InterventionsSurgical Data Science
Nicolas Toussaint
Nicolas Toussaint
Medtronic Digital Technologies
Neural NetworksManifold LearningDifferential GeometryDiffusion Tensor ImagingCardiac MRI
E
Enki Cho
Augmented Intelligence Lab, Kyung Hee University, Seoul, South Korea
H
Hyeon Bae Kim
Augmented Intelligence Lab, Kyung Hee University, Seoul, South Korea
O
Oh Sung Choo
Augmented Intelligence Lab, Kyung Hee University, Seoul, South Korea
Ka Young Kim
Ka Young Kim
Kyung Hee University Graduate Student
Seong Tae Kim
Seong Tae Kim
Assistant Professor of Computer Science, Kyung Hee University
Explainable AITrustworthy AIVision-language ModelsSurgical AIMLLM
G
Gonçalo Arantes
University of Minho, Braga, Portugal
K
Kehan Song
Hanglok Tech, Zhuhai City, China
Jianjun Zhu
Jianjun Zhu
Ph.D. Candidate, Louisiana Tech University
Distributed AlgorithmsByzantine Fault ToleranceDeep Learning
J
Junchen Xiong
Jmees Inc., Kashiwa City, Japan
T
Tingyi Lin
Jmees Inc., Kashiwa City, Japan
S
Shunsuke Kikuchi
Jmees Inc., Kashiwa City, Japan
H
Hiroki Matsuzaki
Jmees Inc., Kashiwa City, Japan
A
Atsushi Kouno
Jmees Inc., Kashiwa City, Japan
J
João Renato Ribeiro Manesco
School of Sciences, São Paulo State University (UNESP), Bauru, Brazil