GaussianVideo: Efficient Video Representation and Compression by Gaussian Splatting

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

career value

246K/year

🤖 AI Summary

Implicit neural video representations (e.g., NeRV) suffer from slow encoding/decoding and high GPU memory consumption. To address these limitations, this work proposes an efficient video representation and compression framework based on deformable 2D Gaussian splatting. Our key innovation is the first introduction of dynamic, time-varying 2D Gaussian deformation modeling, enabled by a multi-plane spatiotemporal encoder and a lightweight decoder. Crucially, temporal gradients drive the prediction of Gaussian parameters—including position, shape, and color—explicitly exploiting inter-frame redundancy. Experiments demonstrate that our method reduces GPU memory usage by 78.4% and accelerates training and decoding by 5.5× and 12.5×, respectively, compared to NeRV, while maintaining competitive reconstruction quality. This achieves a superior trade-off between fidelity and efficiency, significantly enhancing scalability and practical applicability for real-world video processing tasks.

Technology Category

Application Category

📝 Abstract

Implicit Neural Representation for Videos (NeRV) has introduced a novel paradigm for video representation and compression, outperforming traditional codecs. As model size grows, however, slow encoding and decoding speed and high memory consumption hinder its application in practice. To address these limitations, we propose a new video representation and compression method based on 2D Gaussian Splatting to efficiently handle video data. Our proposed deformable 2D Gaussian Splatting dynamically adapts the transformation of 2D Gaussians at each frame, significantly reducing memory cost. Equipped with a multi-plane-based spatiotemporal encoder and a lightweight decoder, it predicts changes in color, coordinates, and shape of initialized Gaussians, given the time step. By leveraging temporal gradients, our model effectively captures temporal redundancy at negligible cost, significantly enhancing video representation efficiency. Our method reduces GPU memory usage by up to 78.4%, and significantly expedites video processing, achieving 5.5x faster training and 12.5x faster decoding compared to the state-of-the-art NeRV methods.

Problem

Research questions and friction points this paper is trying to address.

Improves video encoding and decoding speed

Reduces memory consumption in video processing

Enhances video representation efficiency using Gaussian Splatting

Innovation

Methods, ideas, or system contributions that make the work stand out.

2D Gaussian Splatting for efficient video compression

Deformable Gaussians reduce memory usage significantly

Multi-plane encoder and lightweight decoder enhance speed

🔎 Similar Papers

No similar papers found.