Balanced 3DGS: Gaussian-wise Parallelism Rendering with Fine-Grained Tiling

📅 2024-12-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address CUDA kernel load imbalance and low rendering efficiency in 3D Gaussian Splatting (3DGS) training—caused by non-uniform pixel-to-Gaussian assignments—this paper proposes a Gaussian-granularity parallel rendering framework with fine-grained dynamic scheduling. Our method introduces three key innovations: (1) the first Gaussian-level parallel rendering mechanism; (2) SM-level dynamic load mapping coupled with low-divergence intra-warp parallelization; and (3) fine-grained tiling-based scheduling with runtime load-aware adaptive kernel switching. Experiments demonstrate up to a 7.52× speedup in forward-rendering CUDA kernel performance, significantly reducing per-iteration training time. The approach effectively mitigates GPU resource idleness and long-tail latency, providing an efficient, scalable foundation for real-time 3DGS training.

Technology Category

Application Category

📝 Abstract
3D Gaussian Splatting (3DGS) is increasingly attracting attention in both academia and industry owing to its superior visual quality and rendering speed. However, training a 3DGS model remains a time-intensive task, especially in load imbalance scenarios where workload diversity among pixels and Gaussian spheres causes poor renderCUDA kernel performance. We introduce Balanced 3DGS, a Gaussian-wise parallelism rendering with fine-grained tiling approach in 3DGS training process, perfectly solving load-imbalance issues. First, we innovatively introduce the inter-block dynamic workload distribution technique to map workloads to Streaming Multiprocessor(SM) resources within a single GPU dynamically, which constitutes the foundation of load balancing. Second, we are the first to propose the Gaussian-wise parallel rendering technique to significantly reduce workload divergence inside a warp, which serves as a critical component in addressing load imbalance. Based on the above two methods, we further creatively put forward the fine-grained combined load balancing technique to uniformly distribute workload across all SMs, which boosts the forward renderCUDA kernel performance by up to 7.52x. Besides, we present a self-adaptive render kernel selection strategy during the 3DGS training process based on different load-balance situations, which effectively improves training efficiency.
Problem

Research questions and friction points this paper is trying to address.

3D Gaussian Splattering
Model Training
Load Balancing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Task Allocation
Gaussian Sphere Independent Operation
Optimal Rendering Strategy Selection
🔎 Similar Papers
No similar papers found.
H
Hao Gui
Li Auto, Shanghai, China
Lin Hu
Lin Hu
NVIDIA, Beijing, China
R
Rui Chen
NVIDIA, Beijing, China
M
Mingxiao Huang
Li Auto, Shanghai, China
Yuxin Yin
Yuxin Yin
Unknown affiliation
J
Jin Yang
Li Auto, Shanghai, China
Y
Yong Wu
Li Auto, Shanghai, China