🤖 AI Summary
Existing methods struggle to simultaneously achieve accurate camera poses, reliable depth estimates, high-quality novel view synthesis, and precise 3D surface reconstruction from casually captured multi-view RGB images. This work proposes NeVStereo, the first framework to deeply integrate NeRF-driven novel view synthesis with multi-view stereo matching. By introducing confidence-guided depth estimation, NeRF-coupled bundle adjustment, and an iterative joint optimization mechanism, NeVStereo enables end-to-end co-optimization of pose, depth, rendering, and geometry. The approach effectively mitigates surface stacking, artifacts, and pose-depth coupling issues, yielding significant performance gains in a zero-shot setting: depth error is reduced by 36%, pose accuracy improves by 10.4%, novel view synthesis quality increases by 4.5%, and mesh reconstruction achieves state-of-the-art results (F1 score of 91.93% and Chamfer distance of 4.35 mm).
📝 Abstract
In modern dense 3D reconstruction, feed-forward systems (e.g., VGGT, pi3) focus on end-to-end matching and geometry prediction but do not explicitly output the novel view synthesis (NVS). Neural rendering-based approaches offer high-fidelity NVS and detailed geometry from posed images, yet they typically assume fixed camera poses and can be sensitive to pose errors. As a result, it remains non-trivial to obtain a single framework that can offer accurate poses, reliable depth, high-quality rendering, and accurate 3D surfaces from casually captured views. We present NeVStereo, a NeRF-driven NVS-stereo architecture that aims to jointly deliver camera poses, multi-view depth, novel view synthesis, and surface reconstruction from multi-view RGB-only inputs. NeVStereo combines NeRF-based NVS for stereo-friendly renderings, confidence-guided multi-view depth estimation, NeRF-coupled bundle adjustment for pose refinement, and an iterative refinement stage that updates both depth and the radiance field to improve geometric consistency. This design mitigated the common NeRF-based issues such as surface stacking, artifacts, and pose-depth coupling. Across indoor, outdoor, tabletop, and aerial benchmarks, our experiments indicate that NeVStereo achieves consistently strong zero-shot performance, with up to 36% lower depth error, 10.4% improved pose accuracy, 4.5% higher NVS fidelity, and state-of-the-art mesh quality (F1 91.93%, Chamfer 4.35 mm) compared to existing prestigious methods.