🤖 AI Summary
Stereo matching and monocular depth estimation suffer from degraded accuracy in ill-posed regions—such as occlusions and textureless surfaces. To address this, we propose a monocular-stereo dual-branch collaborative iterative optimization framework. Our key contributions include: (i) a confidence-guided bidirectional iterative enhancement mechanism that evolves depth priors from object-level coarse structure to pixel-level geometric detail; (ii) confidence-weighted stereo cue fusion; (iii) joint scale-and-displacement correction; and (iv) cross-modal iterative guidance. The method achieves state-of-the-art performance on five major benchmarks—SceneFlow, KITTI 2012/2015, Middlebury, and ETH3D—reducing the Bad 1.0 error on ETH3D by 49.5%. Moreover, it demonstrates strong zero-shot generalization capability across unseen domains and sensor configurations.
📝 Abstract
Stereo matching recovers depth from image correspondences. Existing methods struggle to handle ill-posed regions with limited matching cues, such as occlusions and textureless areas. To address this, we propose MonSter, a novel method that leverages the complementary strengths of monocular depth estimation and stereo matching. MonSter integrates monocular depth and stereo matching into a dual-branch architecture to iteratively improve each other. Confidence-based guidance adaptively selects reliable stereo cues for monodepth scale-shift recovery. The refined monodepth is in turn guides stereo effectively at ill-posed regions. Such iterative mutual enhancement enables MonSter to evolve monodepth priors from coarse object-level structures to pixel-level geometry, fully unlocking the potential of stereo matching. As shown in Fig.1, MonSter ranks 1st across five most commonly used leaderboards -- SceneFlow, KITTI 2012, KITTI 2015, Middlebury, and ETH3D. Achieving up to 49.5% improvements (Bad 1.0 on ETH3D) over the previous best method. Comprehensive analysis verifies the effectiveness of MonSter in ill-posed regions. In terms of zero-shot generalization, MonSter significantly and consistently outperforms state-of-the-art across the board. The code is publicly available at: https://github.com/Junda24/MonSter.