Self-Supervised Stereo Matching with Multi-Baseline Contrastive Learning

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Self-supervised stereo matching suffers from degraded disparity estimation in occluded regions due to the failure of the photometric consistency assumption. To address this, we propose BaCon-Stereo—a novel framework that, for the first time, incorporates multi-baseline geometric complementarity into self-supervised learning. It employs a teacher-student collaborative architecture to fuse multi-view inputs and construct contrastive learning objectives; introduces an occlusion-aware attention map to explicitly model and guide disparity completion in occluded areas; and enables end-to-end training without ground-truth disparity labels. Evaluated on KITTI 2012/2015, BaCon-Stereo outperforms all existing self-supervised methods, achieving significant accuracy gains in both occluded and non-occluded regions. The framework demonstrates strong generalization and robustness across diverse scenes. Additionally, we release BaCon-20k, a large-scale synthetic dataset designed to facilitate research in occlusion-aware stereo matching.

Technology Category

Application Category

📝 Abstract

Current self-supervised stereo matching relies on the photometric consistency assumption, which breaks down in occluded regions due to ill-posed correspondences. To address this issue, we propose BaCon-Stereo, a simple yet effective contrastive learning framework for self-supervised stereo network training in both non-occluded and occluded regions. We adopt a teacher-student paradigm with multi-baseline inputs, in which the stereo pairs fed into the teacher and student share the same reference view but differ in target views. Geometrically, regions occluded in the student's target view are often visible in the teacher's, making it easier for the teacher to predict in these regions. The teacher's prediction is rescaled to match the student's baseline and then used to supervise the student. We also introduce an occlusion-aware attention map to better guide the student in learning occlusion completion. To support training, we synthesize a multi-baseline dataset BaCon-20k. Extensive experiments demonstrate that BaCon-Stereo improves prediction in both occluded and non-occluded regions, achieves strong generalization and robustness, and outperforms state-of-the-art self-supervised methods on both KITTI 2015 and 2012 benchmarks. Our code and dataset will be released upon paper acceptance.

Problem

Research questions and friction points this paper is trying to address.

Addresses photometric consistency breakdown in occluded stereo regions

Proposes contrastive learning for occlusion-aware stereo matching

Enhances prediction accuracy in both occluded and non-occluded areas

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-baseline contrastive learning for occlusion handling

Teacher-student paradigm with cross-baseline supervision

Occlusion-aware attention map for completion guidance

🔎 Similar Papers

No similar papers found.