🤖 AI Summary
Standard 3D Gaussian splatting suffers from high-frequency detail loss, structural artifacts, and excessive smoothing under sparse-view settings due to its reliance solely on pixel-level losses. To address this, this work proposes a wavelet-domain supervision mechanism that explicitly constrains the statistical properties of reconstructed signals in the frequency domain through three components: multi-scale wavelet coefficient alignment loss, kurtosis concentration loss, and cross-band covariance regularization. Theoretically, this formulation excludes degenerate solutions and enhances perceptual quality. Experimental results demonstrate consistent and significant improvements over baseline methods across multiple datasets, with a 9.48% gain in DreamSim score on the WRIVA-ULTRRA benchmark and up to a 0.5 dB increase in PSNR under sparse-view conditions.
📝 Abstract
3D Gaussian Splatting (3DGS) enables real-time novel view synthesis by representing scenes as collections of anisotropic Gaussians optimized via differentiable rasterization. However, standard pixel-space losses (L1, SSIM) constrain only aggregate reconstruction error, permitting the optimization to redistribute error across frequency scales. This leads to oversmoothing and structural artifacts, particularly in sparse-view settings where supervision is limited. We propose KC-3DGS, which augments 3DGS training with wavelet-domain supervision based on natural image statistics. Our method combines three components: (1) a multi-scale wavelet coefficient alignment loss that explicitly penalizes missing high-frequency detail, (2) a supervised kurtosis concentration loss that encourages rendered images to match the heavy-tailed frequency statistics of ground-truth images, and (3) a cross-band covariance penalty that promotes frequency specialization. We provide theoretical analysis showing that pixel-space losses admit a family of indistinguishable perturbations under wavelet redistribution, and that our joint objective excludes degenerate solutions. Experiments across MipNeRF360, Tanks&Temples, MVImgNet, DeepBlending, and WRIVA-ULTRRA demonstrate consistent improvements in perceptual quality. On the challenging WRIVA-ULTRRA outdoor dataset, KC-3DGS achieves a 9.48% improvement in DreamSim while also improving PSNR, SSIM, and LPIPS. In sparse-view settings with only 12 training images, our method improves PSNR by up to 0.5 dB on MipNeRF360 while maintaining perceptual quality. The approach integrates seamlessly into existing 3DGS pipelines as a plug-and-play regularization strategy.