GS-Scale: Unlocking Large-Scale 3D Gaussian Splatting Training via Host Offloading

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address severe GPU memory overflow during 3D Gaussian Splatting (3DGS) training on large-scale scenes, this paper proposes HostSplat—a system-level optimization framework designed for memory-constrained environments. Our approach introduces three key innovations: (1) selective host offloading and on-demand loading of geometric parameters; (2) CPU-based frustum culling coupled with a GPU–CPU collaborative forward pipeline; and (3) a gradient-aware delayed optimizer update mechanism. HostSplat reduces peak GPU memory consumption by 3.3–5.6× without compromising training throughput. It enables training of scenes with up to 18 million Gaussians on consumer-grade GPUs (e.g., RTX 4090)—a 4.5× improvement over baseline methods—and achieves 23–35% gains in LPIPS, significantly expanding the practical applicability of 3DGS for large-scale scene reconstruction.

Technology Category

Application Category

📝 Abstract
The advent of 3D Gaussian Splatting has revolutionized graphics rendering by delivering high visual quality and fast rendering speeds. However, training large-scale scenes at high quality remains challenging due to the substantial memory demands required to store parameters, gradients, and optimizer states, which can quickly overwhelm GPU memory. To address these limitations, we propose GS-Scale, a fast and memory-efficient training system for 3D Gaussian Splatting. GS-Scale stores all Gaussians in host memory, transferring only a subset to the GPU on demand for each forward and backward pass. While this dramatically reduces GPU memory usage, it requires frustum culling and optimizer updates to be executed on the CPU, introducing slowdowns due to CPU's limited compute and memory bandwidth. To mitigate this, GS-Scale employs three system-level optimizations: (1) selective offloading of geometric parameters for fast frustum culling, (2) parameter forwarding to pipeline CPU optimizer updates with GPU computation, and (3) deferred optimizer update to minimize unnecessary memory accesses for Gaussians with zero gradients. Our extensive evaluations on large-scale datasets demonstrate that GS-Scale significantly lowers GPU memory demands by 3.3-5.6x, while achieving training speeds comparable to GPU without host offloading. This enables large-scale 3D Gaussian Splatting training on consumer-grade GPUs; for instance, GS-Scale can scale the number of Gaussians from 4 million to 18 million on an RTX 4070 Mobile GPU, leading to 23-35% LPIPS (learned perceptual image patch similarity) improvement.
Problem

Research questions and friction points this paper is trying to address.

Reducing GPU memory demands for large-scale 3D Gaussian Splatting training
Enabling high-quality scene training on consumer-grade GPUs
Overcoming memory constraints for storing parameters and optimizer states
Innovation

Methods, ideas, or system contributions that make the work stand out.

Host memory storage for Gaussians
Selective offloading for frustum culling
Pipelined CPU-GPU optimizer updates
🔎 Similar Papers
No similar papers found.