🤖 AI Summary
Existing feed-forward 3D Gaussian Splatting (3DGS) methods suffer from uncontrolled Gaussian primitive counts under dense multi-view settings, lacking explicit control over the number of Gaussians during inference. Method: We propose the first feed-forward 3DGS framework enabling explicit specification of the Gaussian count at inference time. Our approach introduces a pixel-aligned initial prediction network and an importance-aware fine-tuning mechanism that jointly performs importance ranking and adaptive parameter rescaling to achieve controllable sparsification; geometric consistency is further enforced via multi-view geometric constraints. Contribution/Results: Under strict constraints on the total number of Gaussians, our method significantly outperforms state-of-the-art approaches in novel-view synthesis quality, while maintaining high reconstruction efficiency and rendering fidelity. This establishes a new paradigm for real-time 3D reconstruction in resource-constrained scenarios.
📝 Abstract
Feed-forward 3D Gaussian Splatting (3DGS) enables efficient one-pass scene reconstruction, providing 3D representations for novel view synthesis without per-scene optimization. However, existing methods typically predict pixel-aligned primitives per-view, producing an excessive number of primitives in dense-view settings and offering no explicit control over the number of predicted Gaussians. To address this, we propose EcoSplat, the first efficiency-controllable feed-forward 3DGS framework that adaptively predicts the 3D representation for any given target primitive count at inference time. EcoSplat adopts a two-stage optimization process. The first stage is Pixel-aligned Gaussian Training (PGT) where our model learns initial primitive prediction. The second stage is Importance-aware Gaussian Finetuning (IGF) stage where our model learns rank primitives and adaptively adjust their parameters based on the target primitive count. Extensive experiments across multiple dense-view settings show that EcoSplat is robust and outperforms state-of-the-art methods under strict primitive-count constraints, making it well-suited for flexible downstream rendering tasks.