🤖 AI Summary
Existing 3D Gaussian Splatting (3DGS) methods rely on adaptive density control, which often introduces floating artifacts and wastes computational resources. To address this, we propose a “pre-densification” strategy: leveraging sparse LiDAR scans fused with monocular depth estimation to generate high-fidelity initial point clouds, and introducing an ROI-aware, content-driven sampling mechanism that prioritizes Gaussian primitive placement in semantically and geometrically critical regions—thereby significantly suppressing redundancy and overlap. Our method jointly optimizes Gaussian positions, scales, and opacities during initialization, enhancing reconstruction fidelity and training efficiency. Experiments on four newly collected datasets demonstrate that our approach achieves rendering quality on par with state-of-the-art methods while reducing training time by 27–41% and GPU memory consumption by 33–52%. Notably, it improves structural integrity and visual consistency in regions of interest within complex scenes.
📝 Abstract
This paper addresses the limitations of existing 3D Gaussian Splatting (3DGS) methods, particularly their reliance on adaptive density control, which can lead to floating artifacts and inefficient resource usage. We propose a novel densify beforehand approach that enhances the initialization of 3D scenes by combining sparse LiDAR data with monocular depth estimation from corresponding RGB images. Our ROI-aware sampling scheme prioritizes semantically and geometrically important regions, yielding a dense point cloud that improves visual fidelity and computational efficiency. This densify beforehand approach bypasses the adaptive density control that may introduce redundant Gaussians in the original pipeline, allowing the optimization to focus on the other attributes of 3D Gaussian primitives, reducing overlap while enhancing visual quality. Our method achieves comparable results to state-of-the-art techniques while significantly lowering resource consumption and training time. We validate our approach through extensive comparisons and ablation studies on four newly collected datasets, showcasing its effectiveness in preserving regions of interest in complex scenes.