🤖 AI Summary
Existing neural rendering-based multi-view surface reconstruction methods rely on post-hoc mesh extraction, which often introduces aliasing artifacts and geometric distortions, hindering downstream editing and applications. This paper proposes an end-to-end explicit mesh reconstruction and rendering framework. Our key contributions are: (1) a decoupled geometry-appearance neural deformation field incorporating global geometric context; (2) geometric feature regularization to enhance surface fidelity and shading consistency; and (3) baking view-invariant diffuse components directly onto vertices for efficient, differentiable rasterization. Experiments show our method trains in just 4.84 minutes and renders each frame in 0.023 seconds—achieving state-of-the-art reconstruction quality while enabling practical operations such as mesh editing and texture recoloring. The framework thus delivers unprecedented efficiency, high reconstruction fidelity, and strong editability within a unified explicit mesh representation.
📝 Abstract
This paper addresses the limitations of neural rendering-based multi-view surface reconstruction methods, which require an additional mesh extraction step that is inconvenient and would produce poor-quality surfaces with mesh aliasing, restricting downstream applications. Building on the explicit mesh representation and differentiable rasterization framework, this work proposes an efficient solution that preserves the high efficiency of this framework while significantly improving reconstruction quality and versatility. Specifically, we introduce a disentangled geometry and appearance model that does not rely on deep networks, enhancing learning and broadening applicability. A neural deformation field is constructed to incorporate global geometric context, enhancing geometry learning, while a novel regularization constrains geometric features passed to a neural shader to ensure its accuracy and boost shading. For appearance, a view-invariant diffuse term is separated and baked into mesh vertices, further improving rendering efficiency. Experimental results demonstrate that the proposed method achieves state-of-the-art training (4.84 minutes) and rendering (0.023 seconds) speeds, with reconstruction quality that is competitive with top-performing methods. Moreover, the method enables practical applications such as mesh and texture editing, showcasing its versatility and application potential. This combination of efficiency, competitive quality, and broad applicability makes our approach a valuable contribution to multi-view surface reconstruction and rendering.