🤖 AI Summary
Existing 3D Gaussian Splatting (3DGS)-based video codecs suffer from prominent visual artifacts and suboptimal compression efficiency. To address this, we propose GFix—a perception-driven, content-adaptive enhancement framework that integrates a single-step diffusion model as a plug-and-play neural enhancer, coupled with a modulation-aware LoRA mechanism: low-rank weights are frozen while latent states are dynamically modulated, enabling efficient fine-tuning and lightweight updates. GFix synergistically combines 3DGS rendering, diffusion-based denoising priors, LoRA adaptation, and intermediate-state modulation to correct distortions introduced by quantization and rendering. Experiments demonstrate that GFix significantly improves reconstruction quality while preserving high compression ratios: compared to GSVC, it achieves a 72.1% BD-rate reduction in LPIPS and a 21.4% improvement in FID, validating its effectiveness in jointly optimizing perceptual fidelity and coding efficiency.
📝 Abstract
3D Gaussian Splatting (3DGS) enhances 3D scene reconstruction through explicit representation and fast rendering, demonstrating potential benefits for various low-level vision tasks, including video compression. However, existing 3DGS-based video codecs generally exhibit more noticeable visual artifacts and relatively low compression ratios. In this paper, we specifically target the perceptual enhancement of 3DGS-based video compression, based on the assumption that artifacts from 3DGS rendering and quantization resemble noisy latents sampled during diffusion training. Building on this premise, we propose a content-adaptive framework, GFix, comprising a streamlined, single-step diffusion model that serves as an off-the-shelf neural enhancer. Moreover, to increase compression efficiency, We propose a modulated LoRA scheme that freezes the low-rank decompositions and modulates the intermediate hidden states, thereby achieving efficient adaptation of the diffusion backbone with highly compressible updates. Experimental results show that GFix delivers strong perceptual quality enhancement, outperforming GSVC with up to 72.1% BD-rate savings in LPIPS and 21.4% in FID.