Learnable Patchmatch and Self-Teaching for Multi-Frame Depth Estimation in Monocular Endoscopy

📅 2022-05-30
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenges of unsupervised multi-frame depth estimation from monocular endoscopic videos—specifically, poor/textureless regions and inter-frame brightness variations that hinder temporal consistency modeling. To this end, we propose a novel self-supervised framework comprising: (1) a learnable PatchMatch module to enhance matching robustness in weak-texture areas; (2) cross-frame–intra-frame joint distillation regularization, which jointly leverages cross-frame teaching and intra-frame self-teaching consistency to suppress brightness variation artifacts; and (3) test-time dynamic multi-frame fusion for improved generalization and depth accuracy. Our method achieves state-of-the-art performance on four major endoscopic datasets—SCARED, EndoSLAM, Hamlyn, and SERV-CT—outperforming prior approaches across all benchmarks. The code and pretrained models are publicly released.
📝 Abstract
This work delves into unsupervised monocular depth estimation in endoscopy, which leverages adjacent frames to establish a supervisory signal during the training phase. For many clinical applications, e.g., surgical navigation, temporally correlated frames are also available at test time. Due to the lack of depth clues, making full use of the temporal correlation among multiple video frames at both phases is crucial for accurate depth estimation. However, several challenges in endoscopic scenes, such as low and homogeneous textures and inter-frame brightness fluctuations, limit the performance gain from the temporal correlation. To fully exploit it, we propose a novel unsupervised multi-frame monocular depth estimation model. The proposed model integrates a learnable patchmatch module to adaptively increase the discriminative ability in regions with low and homogeneous textures, and enforces cross-teaching and self-teaching consistencies to provide efficacious regularizations towards brightness fluctuations. Furthermore, as a byproduct of the self-teaching paradigm, the proposed model is able to improve the depth predictions when more frames are input at test time. We conduct detailed experiments on multiple datasets, including SCARED, EndoSLAM, Hamlyn and SERV-CT. The experimental results indicate that our model exceeds the state-of-the-art competitors. The source code and trained models will be publicly available upon the acceptance.
Problem

Research questions and friction points this paper is trying to address.

Unsupervised depth estimation in monocular endoscopy
Enhancing temporal correlation for accurate depth prediction
Addressing low textures and brightness fluctuations in endoscopy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learnable Patchmatch module
Cross-teaching and self-teaching
Unsupervised multi-frame depth estimation
🔎 Similar Papers
No similar papers found.
Shuwei Shao
Shuwei Shao
Shandong University
Computer Vision3D VisionMedical Image AnalysisSurgical Robotics
Z
Z. Pei
School of Automation Science and Electrical Engineering, Beihang University, Beijing, China
Weihai Chen
Weihai Chen
Beihang University
X
Xingming Wu
School of Automation Science and Electrical Engineering, Beihang University, Beijing, China
Zhong Liu
Zhong Liu
School of Automation Science and Electrical Engineering, Beihang University, Beijing, China