π€ AI Summary
Self-supervised stereo matching suffers from degraded disparity estimation in occluded regions due to the failure of the photometric consistency assumption. To address this, we propose BaCon-Stereoβa novel framework that, for the first time, incorporates multi-baseline geometric complementarity into self-supervised learning. It employs a teacher-student collaborative architecture to fuse multi-view inputs and construct contrastive learning objectives; introduces an occlusion-aware attention map to explicitly model and guide disparity completion in occluded areas; and enables end-to-end training without ground-truth disparity labels. Evaluated on KITTI 2012/2015, BaCon-Stereo outperforms all existing self-supervised methods, achieving significant accuracy gains in both occluded and non-occluded regions. The framework demonstrates strong generalization and robustness across diverse scenes. Additionally, we release BaCon-20k, a large-scale synthetic dataset designed to facilitate research in occlusion-aware stereo matching.
π Abstract
Current self-supervised stereo matching relies on the photometric consistency assumption, which breaks down in occluded regions due to ill-posed correspondences. To address this issue, we propose BaCon-Stereo, a simple yet effective contrastive learning framework for self-supervised stereo network training in both non-occluded and occluded regions. We adopt a teacher-student paradigm with multi-baseline inputs, in which the stereo pairs fed into the teacher and student share the same reference view but differ in target views. Geometrically, regions occluded in the student's target view are often visible in the teacher's, making it easier for the teacher to predict in these regions. The teacher's prediction is rescaled to match the student's baseline and then used to supervise the student. We also introduce an occlusion-aware attention map to better guide the student in learning occlusion completion. To support training, we synthesize a multi-baseline dataset BaCon-20k. Extensive experiments demonstrate that BaCon-Stereo improves prediction in both occluded and non-occluded regions, achieves strong generalization and robustness, and outperforms state-of-the-art self-supervised methods on both KITTI 2015 and 2012 benchmarks. Our code and dataset will be released upon paper acceptance.