Lumina: Real-Time Mobile Neural Rendering by Exploiting Computational Redundancy

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational overhead, poor real-time performance, and low energy efficiency of 3D Gaussian Splatting (3DGS) neural rendering on mobile SoCs, this paper proposes an algorithm–hardware co-optimization framework. Methodologically, it introduces three key innovations: (1) S², the first temporal redundancy compression algorithm that explicitly models and eliminates inter-frame redundancy in Gaussian parameters; (2) a Radiance Cache (RC) mechanism that decouples rasterization frequency from color integration for the first time; and (3) LuminCore, a dedicated rasterization accelerator. Evaluated on both real-world and synthetic scenes, the design achieves 4.5× speedup and 5.3× energy-efficiency improvement over a mobile Volta GPU, with negligible PSNR degradation (<0.2 dB). This work marks the first demonstration of real-time 3DGS neural rendering on mobile platforms.

Technology Category

Application Category

📝 Abstract
3D Gaussian Splatting (3DGS) has vastly advanced the pace of neural rendering, but it remains computationally demanding on today's mobile SoCs. To address this challenge, we propose Lumina, a hardware-algorithm co-designed system, which integrates two principal optimizations: a novel algorithm, S^2, and a radiance caching mechanism, RC, to improve the efficiency of neural rendering. S2 algorithm exploits temporal coherence in rendering to reduce the computational overhead, while RC leverages the color integration process of 3DGS to decrease the frequency of intensive rasterization computations. Coupled with these techniques, we propose an accelerator architecture, LuminCore, to further accelerate cache lookup and address the fundamental inefficiencies in Rasterization. We show that Lumina achieves 4.5x speedup and 5.3x energy reduction against a mobile Volta GPU, with a marginal quality loss (<0.2 dB peak signal-to-noise ratio reduction) across synthetic and real-world datasets.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational demands of 3DGS on mobile SoCs
Exploiting temporal coherence to lower rendering overhead
Decreasing rasterization frequency via radiance caching
Innovation

Methods, ideas, or system contributions that make the work stand out.

S2 algorithm exploits temporal coherence for efficiency
Radiance caching reduces rasterization computation frequency
LuminCore accelerator enhances cache lookup speed
🔎 Similar Papers
No similar papers found.
Y
Yu Feng
Shanghai Jiao Tong University, Shanghai Qi Zhi Institute, Shanghai, China
Weikai Lin
Weikai Lin
University of Rochester
Computer Science
Y
Yuge Cheng
Shanghai Jiao Tong University, Shanghai, China
Z
Zihan Liu
Shanghai Jiao Tong University, Shanghai Qi Zhi Institute, Shanghai, China
Jingwen Leng
Jingwen Leng
Professor, Shanghai Jiao Tong University
Computer Architecture
Minyi Guo
Minyi Guo
IEEE Fellow, Chair Professor, Shanghai Jiao Tong University
Parallel ComputingCompiler OptimizationCloud ComputingNetworkingBig Data
C
Chen Chen
Shanghai Jiao Tong University, Shanghai, China
S
Shixuan Sun
Shanghai Jiao Tong University, Shanghai, China
Y
Yuhao Zhu
University of Rochester, Rochester, NY, USA