🤖 AI Summary
To address the challenge of efficient reconstruction for large-scale, complex scenes (e.g., city-scale environments), this paper proposes an implicit neural representation method that requires no explicit geometric priors. The core innovation lies in introducing light field probes to construct a multi-scale implicit scene representation: sparse image inputs and densely sampled depth information jointly encode the scene, while lightweight probes decouple rendering cost from scene complexity. This enables high-fidelity real-time novel-view synthesis, overcoming the traditional trade-off among scalability, fidelity, and efficiency in neural rendering. Experiments demonstrate photo-realistic reconstruction and streaming of city-scale scenes, significantly reducing 3D data maintenance overhead. The method exhibits strong practicality for real-time applications such as VR and AR.
📝 Abstract
Reconstructing photo-realistic large-scale scenes from images, for example at city scale, is a long-standing problem in computer graphics. Neural rendering is an emerging technique that enables photo-realistic image synthesis from previously unobserved viewpoints; however, state-of-the-art neural rendering methods have difficulty efficiently rendering a high complex large-scale scene because these methods typically trade scene size, fidelity, and rendering speed for quality. The other stream of techniques utilizes scene geometries for reconstruction. But the cost of building and maintaining a large set of geometry data increases as scene size grows. Our work explores novel view synthesis methods that efficiently reconstruct complex scenes without explicit use of scene geometries. Specifically, given sparse images of the scene (captured from the real world), we reconstruct intermediate, multi-scale, implicit representations of scene geometries. In this way, our method avoids explicitly relying on scene geometry, significantly reducing the computational cost of maintaining large 3D data. Unlike current methods, we reconstruct the scene using a probe data structure. Probe data hold highly accurate depth information of dense data points, enabling the reconstruction of highly complex scenes. By reconstructing the scene using probe data, the rendering cost is independent of the complexity of the scene. As such, our approach combines geometry reconstruction and novel view synthesis. Moreover, when rendering large-scale scenes, compressing and streaming probe data is more efficient than using explicit scene geometry. Therefore, our neural representation approach can potentially be applied to virtual reality (VR) and augmented reality (AR) applications.