SurgPose: Generalisable Surgical Instrument Pose Estimation using Zero-Shot Learning and Stereo Vision

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Accurate 6-DoF pose estimation of unknown surgical instruments in robot-assisted minimally invasive surgery (RMIS) remains challenging; marker-based approaches suffer from occlusion and specular reflection, while supervised learning methods exhibit poor generalization and require extensive annotated data. Method: This work introduces zero-shot RGB-D pose estimation to RMIS for the first time. We enhance depth estimation robustness against occlusion via RAFT-Stereo and replace SAM with fine-tuned Mask R-CNN for more precise instrument segmentation. Crucially, our method requires no training data for target instruments and supports cross-instrument zero-shot generalization. Contribution/Results: On unseen instruments, our approach significantly outperforms FoundationPose. We establish the first zero-shot RGB-D pose estimation benchmark for RMIS, enabling a new paradigm for surgical navigation and autonomous control—characterized by high generalizability and minimal data dependency.

Technology Category

Application Category

📝 Abstract
Accurate pose estimation of surgical tools in Robot-assisted Minimally Invasive Surgery (RMIS) is essential for surgical navigation and robot control. While traditional marker-based methods offer accuracy, they face challenges with occlusions, reflections, and tool-specific designs. Similarly, supervised learning methods require extensive training on annotated datasets, limiting their adaptability to new tools. Despite their success in other domains, zero-shot pose estimation models remain unexplored in RMIS for pose estimation of surgical instruments, creating a gap in generalising to unseen surgical tools. This paper presents a novel 6 Degrees of Freedom (DoF) pose estimation pipeline for surgical instruments, leveraging state-of-the-art zero-shot RGB-D models like the FoundationPose and SAM-6D. We advanced these models by incorporating vision-based depth estimation using the RAFT-Stereo method, for robust depth estimation in reflective and textureless environments. Additionally, we enhanced SAM-6D by replacing its instance segmentation module, Segment Anything Model (SAM), with a fine-tuned Mask R-CNN, significantly boosting segmentation accuracy in occluded and complex conditions. Extensive validation reveals that our enhanced SAM-6D surpasses FoundationPose in zero-shot pose estimation of unseen surgical instruments, setting a new benchmark for zero-shot RGB-D pose estimation in RMIS. This work enhances the generalisability of pose estimation for unseen objects and pioneers the application of RGB-D zero-shot methods in RMIS.
Problem

Research questions and friction points this paper is trying to address.

Accurate surgical tool pose estimation in RMIS
Overcoming occlusion and reflection challenges in marker methods
Zero-shot learning for generalizing to unseen surgical tools
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot learning for surgical tool pose estimation
Stereo vision with RAFT-Stereo for depth estimation
Enhanced SAM-6D with fine-tuned Mask R-CNN
🔎 Similar Papers
No similar papers found.
Utsav Rai
Utsav Rai
Imperial College London
Artificial Intelligence and Machine Learning
H
Haozheng Xu
Hamlyn Centre for Robotic Surgery, Department of Surgery and Cancer, Imperial College London, UK
S
S. Giannarou
Hamlyn Centre for Robotic Surgery, Department of Surgery and Cancer, Imperial College London, UK