Harnessing Foundation Models for Robust and Generalizable 6-DOF Bronchoscopy Localization

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current bronchoscopic 6-DOF localization methods suffer from poor cross-patient generalizability and insufficient robustness under visual degradation (e.g., occlusion, motion blur). To address these challenges, we propose PANSv2—a novel framework that unifies generalizability and robustness by synergistically integrating: (i) end-to-end endoscopic foundation models—EndoOmni for depth estimation and EndoMamba for temporal anatomical landmark detection; (ii) centerline-based geometric constraints; (iii) joint probabilistic pose optimization; and (iv) a dynamic failure detection and reinitialization mechanism. Evaluated on 10 clinical cases, PANSv2 achieves state-of-the-art performance on the SR-5 metric, improving upon prior art by 18.1%. This advancement significantly enhances the clinical feasibility of real-time intraoperative bronchoscopic localization.

Technology Category

Application Category

📝 Abstract
Vision-based 6-DOF bronchoscopy localization offers a promising solution for accurate and cost-effective interventional guidance. However, existing methods struggle with 1) limited generalization across patient cases due to scarce labeled data, and 2) poor robustness under visual degradation, as bronchoscopy procedures frequently involve artifacts such as occlusions and motion blur that impair visual information. To address these challenges, we propose PANSv2, a generalizable and robust bronchoscopy localization framework. Motivated by PANS that leverages multiple visual cues for pose likelihood measurement, PANSv2 integrates depth estimation, landmark detection, and centerline constraints into a unified pose optimization framework that evaluates pose probability and solves for the optimal bronchoscope pose. To further enhance generalization capabilities, we leverage the endoscopic foundation model EndoOmni for depth estimation and the video foundation model EndoMamba for landmark detection, incorporating both spatial and temporal analyses. Pretrained on diverse endoscopic datasets, these models provide stable and transferable visual representations, enabling reliable performance across varied bronchoscopy scenarios. Additionally, to improve robustness to visual degradation, we introduce an automatic re-initialization module that detects tracking failures and re-establishes pose using landmark detections once clear views are available. Experimental results on bronchoscopy dataset encompassing 10 patient cases show that PANSv2 achieves the highest tracking success rate, with an 18.1% improvement in SR-5 (percentage of absolute trajectory error under 5 mm) compared to existing methods, showing potential towards real clinical usage.
Problem

Research questions and friction points this paper is trying to address.

Improves generalization in bronchoscopy localization across patient cases
Enhances robustness under visual degradation like occlusions and blur
Integrates foundation models for depth estimation and landmark detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates depth, landmark, centerline for pose optimization
Uses EndoOmni and EndoMamba for generalization
Automatic re-initialization for robustness to degradation
🔎 Similar Papers
No similar papers found.
Qingyao Tian
Qingyao Tian
Ph.D. candidate, Institute of Automation, Chinese Academy of Sciences
AI for healthcaremedical imagingfoundation models
H
Huai Liao
The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
X
Xinyan Huang
The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
B
Bingyu Yang
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
H
Hongbin Liu
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; Centre for Artificial Intelligence and Robotics, Chinese Academy of Sciences, HK, China; School of Engineering and Imaging Sciences, King’s College London, UK