Self-Supervised Stereo Matching with Multi-Baseline Contrastive Learning

πŸ“… 2025-08-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Self-supervised stereo matching suffers from degraded disparity estimation in occluded regions due to the failure of the photometric consistency assumption. To address this, we propose BaCon-Stereoβ€”a novel framework that, for the first time, incorporates multi-baseline geometric complementarity into self-supervised learning. It employs a teacher-student collaborative architecture to fuse multi-view inputs and construct contrastive learning objectives; introduces an occlusion-aware attention map to explicitly model and guide disparity completion in occluded areas; and enables end-to-end training without ground-truth disparity labels. Evaluated on KITTI 2012/2015, BaCon-Stereo outperforms all existing self-supervised methods, achieving significant accuracy gains in both occluded and non-occluded regions. The framework demonstrates strong generalization and robustness across diverse scenes. Additionally, we release BaCon-20k, a large-scale synthetic dataset designed to facilitate research in occlusion-aware stereo matching.

Technology Category

Application Category

πŸ“ Abstract
Current self-supervised stereo matching relies on the photometric consistency assumption, which breaks down in occluded regions due to ill-posed correspondences. To address this issue, we propose BaCon-Stereo, a simple yet effective contrastive learning framework for self-supervised stereo network training in both non-occluded and occluded regions. We adopt a teacher-student paradigm with multi-baseline inputs, in which the stereo pairs fed into the teacher and student share the same reference view but differ in target views. Geometrically, regions occluded in the student's target view are often visible in the teacher's, making it easier for the teacher to predict in these regions. The teacher's prediction is rescaled to match the student's baseline and then used to supervise the student. We also introduce an occlusion-aware attention map to better guide the student in learning occlusion completion. To support training, we synthesize a multi-baseline dataset BaCon-20k. Extensive experiments demonstrate that BaCon-Stereo improves prediction in both occluded and non-occluded regions, achieves strong generalization and robustness, and outperforms state-of-the-art self-supervised methods on both KITTI 2015 and 2012 benchmarks. Our code and dataset will be released upon paper acceptance.
Problem

Research questions and friction points this paper is trying to address.

Addresses photometric consistency breakdown in occluded stereo regions
Proposes contrastive learning for occlusion-aware stereo matching
Enhances prediction accuracy in both occluded and non-occluded areas
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-baseline contrastive learning for occlusion handling
Teacher-student paradigm with cross-baseline supervision
Occlusion-aware attention map for completion guidance
πŸ”Ž Similar Papers
No similar papers found.
P
Peng Xu
College of Information Science and Electronic Engineering, Zhejiang University, China
Zhiyu Xiang
Zhiyu Xiang
Professor of Information & Electronic Engineering, Zhejiang University
Computer visionRobotics
Jingyun Fu
Jingyun Fu
Zhejiang University
Computer Vision
T
Tianyu Pu
College of Information Science and Electronic Engineering, Zhejiang University, China
K
Kai Wang
College of Information Science and Electronic Engineering, Zhejiang University, China
C
Chaojie Ji
College of Information Science and Electronic Engineering, Zhejiang University, China
T
Tingming Bai
College of Information Science and Electronic Engineering, Zhejiang University, China
Eryun Liu
Eryun Liu
Zhejiang University
Computer VisionImage ProcessingBiometricsFingerprintPalmprint