🤖 AI Summary
To address the low geometric accuracy, poor training efficiency, and weak generalization of 3D Gaussian Splatting (3DGS) in large-scale indoor full-scene reconstruction, this paper proposes a universal 3DGS framework tailored for holistic scene modeling. Our method introduces three key innovations: (1) a low-cost cross-view feature aggregation mechanism to mitigate geometric ambiguity under sparse-view conditions; (2) pixel-level triplet fusion coupled with weighted floating-point volume culling to enforce explicit geometric consistency; and (3) depth-regularized per-scene fine-tuning to enhance generalization and robustness. Experiments on indoor benchmarks—including ScanNet—demonstrate significant improvements in both reconstruction geometry fidelity and novel-view synthesis quality. Training time is reduced by over 40% compared to baseline 3DGS, while maintaining high scalability and rendering efficiency. The proposed framework thus offers an effective, reliable, and computationally efficient alternative for large-scale indoor 3D modeling.
📝 Abstract
Recently, the integration of the efficient feed-forward scheme into 3D Gaussian Splatting (3DGS) has been actively explored. However, most existing methods focus on sparse view reconstruction of small regions and cannot produce eligible whole-scene reconstruction results in terms of either quality or efficiency. In this paper, we propose FreeSplat++, which focuses on extending the generalizable 3DGS to become an alternative approach to large-scale indoor whole-scene reconstruction, which has the potential of significantly accelerating the reconstruction speed and improving the geometric accuracy. To facilitate whole-scene reconstruction, we initially propose the Low-cost Cross-View Aggregation framework to efficiently process extremely long input sequences. Subsequently, we introduce a carefully designed pixel-wise triplet fusion method to incrementally aggregate the overlapping 3D Gaussian primitives from multiple views, adaptively reducing their redundancy. Furthermore, we propose a weighted floater removal strategy that can effectively reduce floaters, which serves as an explicit depth fusion approach that is crucial in whole-scene reconstruction. After the feed-forward reconstruction of 3DGS primitives, we investigate a depth-regularized per-scene fine-tuning process. Leveraging the dense, multi-view consistent depth maps obtained during the feed-forward prediction phase for an extra constraint, we refine the entire scene's 3DGS primitive to enhance rendering quality while preserving geometric accuracy. Extensive experiments confirm that our FreeSplat++ significantly outperforms existing generalizable 3DGS methods, especially in whole-scene reconstructions. Compared to conventional per-scene optimized 3DGS approaches, our method with depth-regularized per-scene fine-tuning demonstrates substantial improvements in reconstruction accuracy and a notable reduction in training time.