🤖 AI Summary
This work addresses the trade-off between the limited rendering quality of 3D Gaussian Splatting (3DGS) and the slow inference speed of Neural Radiance Fields (NeRF) in novel view synthesis for street scenes. To bridge this gap, the authors propose a novel approach that leverages high-fidelity images generated by a pre-trained street-scene NeRF as training data for 3DGS. Transient objects are removed, bird’s-eye supplementary views are synthesized, and a diffusion model is integrated to enhance view consistency and visual quality. This strategy effectively combines the real-time rendering efficiency of 3DGS with the photorealistic fidelity of NeRF. Evaluated on one synthetic and two real-world street scene datasets, the method achieves significant improvements in rendering quality while preserving the computational advantages of both paradigms.
📝 Abstract
Neural radiance field (NeRF) and 3D Gaussian splatting (3DGS) are two mainstream approaches for novel view synthesis. They often show complementary performance, i.e., 3DGS demonstrating faster rendering speed and NeRF demonstrating higher rendering quality. Motivated by this, we propose leveraging NeRF-rendered images for 3DGS. Specifically, we target street scenes and utilize a pre-trained street-specific NeRF method to produce training images for a target 3DGS method. In our 3DGS training, NeRF-rendered images are used to remove transient objects in street-level input views and to generate bird's-eye views as additional views, inheriting the higher-quality rendering of NeRF into 3DGS. We further incorporate a diffusion-based image enhancement to improve the image quality of the additional views. Experimental results on one synthetic and two real datasets demonstrate that our proposed method improves street-scene rendering while preserving the speed of 3DGS and the quality of NeRF.