MonSter: Marry Monodepth to Stereo Unleashes Power

📅 2025-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Stereo matching and monocular depth estimation suffer from degraded accuracy in ill-posed regions—such as occlusions and textureless surfaces. To address this, we propose a monocular-stereo dual-branch collaborative iterative optimization framework. Our key contributions include: (i) a confidence-guided bidirectional iterative enhancement mechanism that evolves depth priors from object-level coarse structure to pixel-level geometric detail; (ii) confidence-weighted stereo cue fusion; (iii) joint scale-and-displacement correction; and (iv) cross-modal iterative guidance. The method achieves state-of-the-art performance on five major benchmarks—SceneFlow, KITTI 2012/2015, Middlebury, and ETH3D—reducing the Bad 1.0 error on ETH3D by 49.5%. Moreover, it demonstrates strong zero-shot generalization capability across unseen domains and sensor configurations.

Technology Category

Application Category

📝 Abstract
Stereo matching recovers depth from image correspondences. Existing methods struggle to handle ill-posed regions with limited matching cues, such as occlusions and textureless areas. To address this, we propose MonSter, a novel method that leverages the complementary strengths of monocular depth estimation and stereo matching. MonSter integrates monocular depth and stereo matching into a dual-branch architecture to iteratively improve each other. Confidence-based guidance adaptively selects reliable stereo cues for monodepth scale-shift recovery. The refined monodepth is in turn guides stereo effectively at ill-posed regions. Such iterative mutual enhancement enables MonSter to evolve monodepth priors from coarse object-level structures to pixel-level geometry, fully unlocking the potential of stereo matching. As shown in Fig.1, MonSter ranks 1st across five most commonly used leaderboards -- SceneFlow, KITTI 2012, KITTI 2015, Middlebury, and ETH3D. Achieving up to 49.5% improvements (Bad 1.0 on ETH3D) over the previous best method. Comprehensive analysis verifies the effectiveness of MonSter in ill-posed regions. In terms of zero-shot generalization, MonSter significantly and consistently outperforms state-of-the-art across the board. The code is publicly available at: https://github.com/Junda24/MonSter.
Problem

Research questions and friction points this paper is trying to address.

Stereoscopic Matching
Monocular Depth Prediction
Invisible or Monotonic Regions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Monocular Depth Prediction
Stereo Matching
Iterative Optimization
🔎 Similar Papers
No similar papers found.