MIDAS: Modeling Ground-Truth Distributions with Dark Knowledge for Domain Generalized Stereo Matching

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing domain generalization stereo matching methods suffer from domain bias in synthetic-to-real domain transfer, limiting their generalization capability. To address this, we propose a novel framework integrating dark knowledge distillation with multimodal ground-truth modeling. Specifically, we are the first to decouple objective knowledge from domain-specific bias in the Laplace parameter space, constructing fine-grained mixed Laplacian ground-truth distributions separately for edge and non-edge regions. Furthermore, we introduce multimodal supervision and network ensembling to explicitly model the uncertainty inherent in disparity ground truth. Our approach significantly enhances cross-domain robustness: PCWNet+MIDAS achieves state-of-the-art performance on KITTI 2015 and KITTI 2012, and ranks first in comprehensive evaluation across four real-world datasets.

Technology Category

Application Category

📝 Abstract
Despite the significant advances in domain generalized stereo matching, existing methods still exhibit domain-specific preferences when transferring from synthetic to real domains, hindering their practical applications in complex and diverse scenarios. The probability distributions predicted by the stereo network naturally encode rich similarity and uncertainty information. Inspired by this observation, we propose to extract these two types of dark knowledge from the pre-trained network to model intuitive multi-modal ground-truth distributions for both edge and non-edge regions. To mitigate the inherent domain preferences of a single network, we adopt network ensemble and further distinguish between objective and biased knowledge in the Laplace parameter space. Finally, the objective knowledge and the original disparity labels are jointly modeled as a mixture of Laplacians to provide fine-grained supervision for the stereo network training. Extensive experiments demonstrate that: 1) Our method is generic and effectively improves the generalization of existing networks. 2) PCWNet with our method achieves the state-of-the-art generalization performance on both KITTI 2015 and 2012 datasets. 3) Our method outperforms existing methods in comprehensive ranking across four popular real-world datasets.
Problem

Research questions and friction points this paper is trying to address.

Improves domain generalization in stereo matching
Extracts similarity and uncertainty from pre-trained networks
Uses network ensemble to reduce domain-specific biases
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts dark knowledge for multi-modal distributions
Uses network ensemble to reduce domain preferences
Models supervision as mixture of Laplacians
🔎 Similar Papers
No similar papers found.
P
Peng Xu
College of Information Science and Electronic Engineering, Zhejiang University
Zhiyu Xiang
Zhiyu Xiang
Professor of Information & Electronic Engineering, Zhejiang University
Computer visionRobotics
Jingyun Fu
Jingyun Fu
Zhejiang University
Computer Vision
T
Tianyu Pu
College of Information Science and Electronic Engineering, Zhejiang University
H
Hanzhi Zhong
College of Information Science and Electronic Engineering, Zhejiang University
Eryun Liu
Eryun Liu
Zhejiang University
Computer VisionImage ProcessingBiometricsFingerprintPalmprint