🤖 AI Summary
Conventional intraoperative registration methods for cochlear implantation—relying on external optical tracking systems or surface fiducial markers—suffer from complex setup and poor real-time performance. To address this, we propose a zero-shot, monocular, markerless, and hardware-free 6D camera pose estimation framework. Our approach employs a lightweight neural network to directly map preoperative CT-derived 3D anatomical structures to intraoperative microscope 2D images, enabling real-time patient-to-image registration. Trained exclusively on synthetic microscopic surgical data, the model regresses full 6D pose parameters (rotation matrix and translation vector) and demonstrates strong cross-patient generalization. Evaluated on nine clinical cases, it achieves clinically viable accuracy, with >85% of angular errors ≤10°. This work presents the first monocular, intraoperative registration method for cochlear implantation that eliminates the need for optical trackers or fiducial markers—significantly enhancing the real-time capability, robustness, and clinical deployability of surgical navigation systems.
📝 Abstract
This paper presents a novel method for monocular patient-to-image intraoperative registration, specifically designed to operate without any external hardware tracking equipment or fiducial point markers. Leveraging a synthetic microscopy surgical scene dataset with a wide range of transformations, our approach directly maps preoperative CT scans to 2D intraoperative surgical frames through a lightweight neural network for real-time cochlear implant surgery guidance via a zero-shot learning approach. Unlike traditional methods, our framework seamlessly integrates with monocular surgical microscopes, making it highly practical for clinical use without additional hardware dependencies and requirements. Our method estimates camera poses, which include a rotation matrix and a translation vector, by learning from the synthetic dataset, enabling accurate and efficient intraoperative registration. The proposed framework was evaluated on nine clinical cases using a patient-specific and cross-patient validation strategy. Our results suggest that our approach achieves clinically relevant accuracy in predicting 6D camera poses for registering 3D preoperative CT scans to 2D surgical scenes with an angular error within 10 degrees in most cases, while also addressing limitations of traditional methods, such as reliance on external tracking systems or fiducial markers.