🤖 AI Summary
To address critical issues in sparse voxel rasterization for scene reconstruction—including low-frequency content underfitting, high GPU memory consumption, and fragile pruning strategies—this paper proposes an adaptive training framework. Methodologically: (1) an inverse Sobel reweighting scheme combined with a gamma-ramp mechanism enhances low-frequency perception; (2) depth-based quantile-driven dynamic pruning coupled with exponential moving average (EMA)-guided hysteresis protection improves pruning robustness; and (3) ray-footprint-driven adaptive voxel subdivision and maximum-mixture-weight quantized pruning optimize spatial resolution and sparsity. Evaluated on Mip-NeRF 360 and Tanks & Temples, our method achieves PSNR/SSIM competitive with strong baselines, maintains comparable training speed and rendering frame rate, reduces peak GPU memory by 40–60%, and significantly improves fidelity of low-frequency details and geometric boundary stability.
📝 Abstract
Sparse-voxel rasterization is a fast, differentiable alternative for optimization-based scene reconstruction, but it tends to underfit low-frequency content, depends on brittle pruning heuristics, and can overgrow in ways that inflate VRAM. We introduce LiteVoxel, a self-tuning training pipeline that makes SV rasterization both steadier and lighter. Our loss is made low-frequency aware via an inverse-Sobel reweighting with a mid-training gamma-ramp, shifting gradient budget to flat regions only after geometry stabilize. Adaptation replaces fixed thresholds with a depth-quantile pruning logic on maximum blending weight, stabilized by EMA-hysteresis guards and refines structure through ray-footprint-based, priority-driven subdivision under an explicit growth budget. Ablations and full-system results across Mip-NeRF 360 (6scenes) and Tanks&Temples (3scenes) datasets show mitigation of errors in low-frequency regions and boundary instability while keeping PSNR/SSIM, training time, and FPS comparable to a strong SVRaster pipeline. Crucially, LiteVoxel reduces peak VRAM by ~40%-60% and preserves low-frequency detail that prior setups miss, enabling more predictable, memory-efficient training without sacrificing perceptual quality.