A Single Atlas is All You Need: Decoder-Side Gaussian Splatting for Immersive Video

📅 2026-05-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

230K/year
🤖 AI Summary
This work addresses the challenges of immersive video transmission under bandwidth constraints, where conventional decoder-side depth estimation struggles with complex geometry, view flickering, and non-Lambertian reflectance. While existing 3D Gaussian splatting methods offer high visual quality, their substantial bandwidth overhead limits practical deployment. To overcome this, we propose the first feed-forward 3D Gaussian splatting inference integrated directly into the decoder, enabling voxel scene reconstruction using only compressed texture and lightweight metadata—fully replacing traditional depth estimation. We further reveal that lossy compression inherently acts as an implicit low-pass filter, enhancing prediction stability. Experiments demonstrate that, using a single 2D atlas containing four views, our method achieves a 5.79 dB gain in BD-PSNR and a 0.054 improvement in BD-SSIM over the DSDE baseline, reduces the maximum inter-view Delta IV-PSNR from 17.2 dB to 6.4 dB, and attains a tenfold increase in compression efficiency.
📝 Abstract
Immersive video delivery is bottlenecked by pixel-rate constraints, making the transmission of high-resolution depth maps or explicit 3D volumetric data expensive. Decoder-Side Depth Estimation (DSDE) shifts depth computation to the client, but struggles with complex geometries, inter-view flickering, and non-Lambertian reflections. Conversely, 3D Gaussian Splatting (3DGS) offers state-of-the-art view synthesis, but transmitting splats (or their projected 2D maps) incurs prohibitive bandwidth costs and is poorly aligned with standard video codecs. We propose Decoder-Side Gaussian Splatting (DSGS), a framework that natively replaces the depth-estimation stage of DSDE with feed-forward 3DGS inference, optimizing volumetric scenes entirely on the decoder side from compressed textures and metadata. A central, counterintuitive finding is that lossy compression acts as an implicit low-pass filter stabilizing feed-forward splat prediction: compressed bitstreams exceed lossless quality while shrinking tenfold. Under extreme view sparsity (one 2D atlas comprising 4 input views), DSGS achieves a +5.79 dB BD-PSNR and +0.054 BD-SSIM gain over the DSDE anchor while reducing maximum inter-view Delta IV-PSNR from 17.2 dB to 6.4 dB, minimizing the domain shift between transmitted and virtual viewports.
Problem

Research questions and friction points this paper is trying to address.

Immersive video
Decoder-Side Depth Estimation
3D Gaussian Splatting
Bandwidth efficiency
View synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoder-Side Gaussian Splatting
Immersive Video Compression
3D Gaussian Splatting
View Synthesis
Lossy Compression as Regularization
🔎 Similar Papers
No similar papers found.