🤖 AI Summary
To address the challenges of large data volume, complex motion modeling, and poor editability in dynamic volumetric video streaming, this paper proposes a hierarchical motion-decoupled 4D Gaussian representation. The scene is decomposed into static background and dynamic foreground components; temporal consistency is ensured via adaptive keyframe insertion and independent foreground streaming encoding. A multi-resolution motion estimation grid, lightweight shared MLP, entropy-aware training, and joint range coding with KD-tree compression are integrated to achieve rate-distortion optimization. Evaluated on multiple standard benchmarks, the method achieves an average storage cost of only 11.4 KB per frame while matching state-of-the-art reconstruction quality. Crucially, it enables fine-grained editing operations—such as background replacement—thereby significantly enhancing volumetric video editability, scalability, and streaming efficiency.
📝 Abstract
Volumetric video has emerged as a key medium for immersive telepresence and augmented/virtual reality, enabling six-degrees-of-freedom (6DoF) navigation and realistic spatial interactions. However, delivering high-quality dynamic volumetric content at scale remains challenging due to massive data volume, complex motion, and limited editability of existing representations. In this paper, we present 4D-MoDe, a motion-decoupled 4D Gaussian compression framework designed for scalable and editable volumetric video streaming. Our method introduces a layered representation that explicitly separates static backgrounds from dynamic foregrounds using a lookahead-based motion decomposition strategy, significantly reducing temporal redundancy and enabling selective background/foreground streaming. To capture continuous motion trajectories, we employ a multi-resolution motion estimation grid and a lightweight shared MLP, complemented by a dynamic Gaussian compensation mechanism to model emergent content. An adaptive grouping scheme dynamically inserts background keyframes to balance temporal consistency and compression efficiency. Furthermore, an entropy-aware training pipeline jointly optimizes the motion fields and Gaussian parameters under a rate-distortion (RD) objective, while employing range-based and KD-tree compression to minimize storage overhead. Extensive experiments on multiple datasets demonstrate that 4D-MoDe consistently achieves competitive reconstruction quality with an order of magnitude lower storage cost (e.g., as low as extbf{11.4} KB/frame) compared to state-of-the-art methods, while supporting practical applications such as background replacement and foreground-only streaming.