Boosting Multi-View Stereo with Depth Foundation Model in the Absence of Real-World Labels

๐Ÿ“… 2025-04-16
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses the challenging problem of multi-view stereo (MVS) reconstruction in the absence of ground-truth depth labels. To this end, we propose DFM-MVSโ€”a novel framework that pioneers the integration of depth foundation models (DFMs) to generate high-confidence depth priors. Leveraging these priors, we establish a prior-driven pseudo-supervision training paradigm and design a prior-guided error correction module, enabling coarse-to-fine stereo matching optimization and explicit geometric consistency modeling. Crucially, DFM-MVS operates without any real depth supervision, effectively mitigating key bottlenecks in unsupervised MVSโ€”namely, severe noise in pseudo-labels and weak geometric constraints. Extensive experiments on DTU and Tanks & Temples benchmarks demonstrate that DFM-MVS consistently outperforms existing unsupervised and self-supervised methods, achieving reconstruction accuracy close to state-of-the-art supervised approaches. These results underscore the pivotal role and strong generalizability of depth priors in weakly supervised MVS.

Technology Category

Application Category

๐Ÿ“ Abstract
Learning-based Multi-View Stereo (MVS) methods have made remarkable progress in recent years. However, how to effectively train the network without using real-world labels remains a challenging problem. In this paper, driven by the recent advancements of vision foundation models, a novel method termed DFM-MVS, is proposed to leverage the depth foundation model to generate the effective depth prior, so as to boost MVS in the absence of real-world labels. Specifically, a depth prior-based pseudo-supervised training mechanism is developed to simulate realistic stereo correspondences using the generated depth prior, thereby constructing effective supervision for the MVS network. Besides, a depth prior-guided error correction strategy is presented to leverage the depth prior as guidance to mitigate the error propagation problem inherent in the widely-used coarse-to-fine network structure. Experimental results on DTU and Tanks&Temples datasets demonstrate that the proposed DFM-MVS significantly outperforms existing MVS methods without using real-world labels.
Problem

Research questions and friction points this paper is trying to address.

Train MVS networks without real-world depth labels
Generate depth priors using foundation models
Mitigate error propagation in coarse-to-fine MVS
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages depth foundation model for depth prior
Uses pseudo-supervised training with depth prior
Implements depth-guided error correction strategy
๐Ÿ”Ž Similar Papers
No similar papers found.
J
Jie Zhu
Tianjin University
B
Bo Peng
Tianjin University
Z
Zhe Zhang
Tianjin University of Commerce
B
Bingzheng Liu
Tianjin University
Jianjun Lei
Jianjun Lei
Tianjin University
MultimediaVideo Coding3D/VR/ARArtificial IntelligencePattern Recognition