🤖 AI Summary
This work addresses zero-shot generalization in monocular depth estimation, targeting the highly challenging SYNS-Patches benchmark—comprising mixed natural and indoor scenes. We propose an affine-invariant depth modeling paradigm and introduce, for the first time in this task, a two-degree-of-freedom least-squares (2-DOF LS) alignment protocol for evaluation, significantly enhancing cross-domain robustness. Leveraging Depth Anything v2 and Marigold as strong off-the-shelf baselines, we systematically assess the zero-shot transfer capability of pretrained models. All 24 participating teams surpassed both baselines; the winning method improved the 3D F-Score from 22.58% to 23.05%. This advancement notably strengthens generalization performance of monocular depth estimation on unseen, complex scenes and promotes standardization of evaluation protocols for zero-shot depth estimation.
📝 Abstract
This paper presents the results of the fourth edition of the Monocular Depth Estimation Challenge (MDEC), which focuses on zero-shot generalization to the SYNS-Patches benchmark, a dataset featuring challenging environments in both natural and indoor settings. In this edition, we revised the evaluation protocol to use least-squares alignment with two degrees of freedom to support disparity and affine-invariant predictions. We also revised the baselines and included popular off-the-shelf methods: Depth Anything v2 and Marigold. The challenge received a total of 24 submissions that outperformed the baselines on the test set; 10 of these included a report describing their approach, with most leading methods relying on affine-invariant predictions. The challenge winners improved the 3D F-Score over the previous edition's best result, raising it from 22.58% to 23.05%.