TiS-TSL: Image-Label Supervised Surgical Video Stereo Matching via Time-Switchable Teacher-Student Learning

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Stereo matching in minimally invasive surgery demands video-level temporal stability, yet anatomical constraints permit only sparse image-level annotations; existing teacher-student learning (TSL) methods lack explicit temporal consistency modeling, resulting in severe disparity flickering. This paper proposes TiS-TSL, a temporally switchable teacher-student learning framework: it introduces a novel forward/backward dual-path video prediction scheme to enforce bidirectional spatiotemporal consistency; designs a two-stage learning strategy integrating pseudo-label generation, confidence-weighted refinement, and spatiotemporal consistency filtering; and adopts a unified architecture enabling joint image- and video-level training. Evaluated on two public surgical datasets, TiS-TSL achieves improvements of 2.11% in TEPE and 4.54% in EPE over prior methods, significantly suppressing flickering artifacts—marking the first demonstration of high-quality, temporally stable stereo matching under sparse supervision.

Technology Category

Application Category

📝 Abstract
Stereo matching in minimally invasive surgery (MIS) is essential for next-generation navigation and augmented reality. Yet, dense disparity supervision is nearly impossible due to anatomical constraints, typically limiting annotations to only a few image-level labels acquired before the endoscope enters deep body cavities. Teacher-Student Learning (TSL) offers a promising solution by leveraging a teacher trained on sparse labels to generate pseudo labels and associated confidence maps from abundant unlabeled surgical videos. However, existing TSL methods are confined to image-level supervision, providing only spatial confidence and lacking temporal consistency estimation. This absence of spatio-temporal reliability results in unstable disparity predictions and severe flickering artifacts across video frames. To overcome these challenges, we propose TiS-TSL, a novel time-switchable teacher-student learning framework for video stereo matching under minimal supervision. At its core is a unified model that operates in three distinct modes: Image-Prediction (IP), Forward Video-Prediction (FVP), and Backward Video-Prediction (BVP), enabling flexible temporal modeling within a single architecture. Enabled by this unified model, TiS-TSL adopts a two-stage learning strategy. The Image-to-Video (I2V) stage transfers sparse image-level knowledge to initialize temporal modeling. The subsequent Video-to-Video (V2V) stage refines temporal disparity predictions by comparing forward and backward predictions to calculate bidirectional spatio-temporal consistency. This consistency identifies unreliable regions across frames, filters noisy video-level pseudo labels, and enforces temporal coherence. Experimental results on two public datasets demonstrate that TiS-TSL exceeds other image-based state-of-the-arts by improving TEPE and EPE by at least 2.11% and 4.54%, respectively.
Problem

Research questions and friction points this paper is trying to address.

Achieving surgical video stereo matching with minimal supervision using only image-level labels
Addressing temporal inconsistency and flickering artifacts in existing teacher-student learning methods
Developing unified temporal modeling for stable disparity predictions across video frames
Innovation

Methods, ideas, or system contributions that make the work stand out.

Time-switchable teacher-student learning for video stereo matching
Unified model with three modes for flexible temporal modeling
Bidirectional spatio-temporal consistency refines disparity predictions
🔎 Similar Papers
R
Rui Wang
Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology
Y
Ying Zhou
Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology
H
Hao Wang
Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology
Wenwei Zhang
Wenwei Zhang
Shanghai AI Laboratory
Large Language ModelScalable OversightArtificial Intelligence
Q
Qiang Li
Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology
Z
Zhiwei Wang
Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology