🤖 AI Summary
To address the high computational overhead, poor real-time performance, and low energy efficiency of 3D Gaussian Splatting (3DGS) neural rendering on mobile SoCs, this paper proposes an algorithm–hardware co-optimization framework. Methodologically, it introduces three key innovations: (1) S², the first temporal redundancy compression algorithm that explicitly models and eliminates inter-frame redundancy in Gaussian parameters; (2) a Radiance Cache (RC) mechanism that decouples rasterization frequency from color integration for the first time; and (3) LuminCore, a dedicated rasterization accelerator. Evaluated on both real-world and synthetic scenes, the design achieves 4.5× speedup and 5.3× energy-efficiency improvement over a mobile Volta GPU, with negligible PSNR degradation (<0.2 dB). This work marks the first demonstration of real-time 3DGS neural rendering on mobile platforms.
📝 Abstract
3D Gaussian Splatting (3DGS) has vastly advanced the pace of neural rendering, but it remains computationally demanding on today's mobile SoCs. To address this challenge, we propose Lumina, a hardware-algorithm co-designed system, which integrates two principal optimizations: a novel algorithm, S^2, and a radiance caching mechanism, RC, to improve the efficiency of neural rendering. S2 algorithm exploits temporal coherence in rendering to reduce the computational overhead, while RC leverages the color integration process of 3DGS to decrease the frequency of intensive rasterization computations. Coupled with these techniques, we propose an accelerator architecture, LuminCore, to further accelerate cache lookup and address the fundamental inefficiencies in Rasterization. We show that Lumina achieves 4.5x speedup and 5.3x energy reduction against a mobile Volta GPU, with a marginal quality loss (<0.2 dB peak signal-to-noise ratio reduction) across synthetic and real-world datasets.