🤖 AI Summary
Existing neural implicit surface reconstruction methods perform well under controlled conditions—uniform illumination and no occlusions—but suffer severe geometric distortions on in-the-wild, unconstrained image collections exhibiting illumination variations, transient occlusions, and appearance inconsistency. While NeRF-based approaches improve photometric robustness, their lack of explicit surface constraints hinders high-fidelity geometric reconstruction. To address this, we propose the first implicit surface optimization framework integrating sparse SfM point cloud displacement compensation, edge-guided multi-view consistent normal priors, and multiple geometric constraints. Our method unifies structure-from-motion (SfM), neural implicit representation, normal prediction networks, edge-aware filtering, and multi-view geometric consistency modeling. Evaluated on benchmarks including Heritage-Recon, it achieves significant improvements in reconstruction accuracy and geometric detail fidelity, enabling, for the first time, heritage-grade high-fidelity 3D digitization.
📝 Abstract
Neural implicit surface reconstruction using volume rendering techniques has recently achieved significant advancements in creating high-fidelity surfaces from multiple 2D images. However, current methods primarily target scenes with consistent illumination and struggle to accurately reconstruct 3D geometry in uncontrolled environments with transient occlusions or varying appearances. While some neural radiance field (NeRF)-based variants can better manage photometric variations and transient objects in complex scenes, they are designed for novel view synthesis rather than precise surface reconstruction due to limited surface constraints. To overcome this limitation, we introduce a novel approach that applies multiple geometric constraints to the implicit surface optimization process, enabling more accurate reconstructions from unconstrained image collections. First, we utilize sparse 3D points from structure-from-motion (SfM) to refine the signed distance function estimation for the reconstructed surface, with a displacement compensation to accommodate noise in the sparse points. Additionally, we employ robust normal priors derived from a normal predictor, enhanced by edge prior filtering and multi-view consistency constraints, to improve alignment with the actual surface geometry. Extensive testing on the Heritage-Recon benchmark and other datasets has shown that the proposed method can accurately reconstruct surfaces from in-the-wild images, yielding geometries with superior accuracy and granularity compared to existing techniques. Our approach enables high-quality 3D reconstruction of various landmarks, making it applicable to diverse scenarios such as digital preservation of cultural heritage sites.