🤖 AI Summary
To address texture degradation and geometric inconsistency in feed-forward 3D Gaussian Splatting (3DGS) for novel view synthesis (NVS) under wide-baseline sparse-view settings, this paper proposes a two-stage feed-forward framework: an initial stage generates coarse 3D Gaussian primitives, while a subsequent stage employs a single-step diffusion-enhanced model to refine rendering quality. We introduce two key innovations: (1) Maximum Overlap Reference View Injection (MORI), which explicitly encodes multi-view geometric constraints; and (2) Distance-Weighted Epipolar Attention (DWEA), which fuses cross-view features guided by epipolar geometry. Furthermore, a divide-and-conquer joint optimization strategy ensures distribution alignment during training. Evaluated on the RealEstate10K and DL3DV-10K wide-baseline benchmarks, our method achieves an average PSNR gain of 1.0 dB over state-of-the-art approaches, demonstrating significant improvements in both geometric fidelity and texture coherence.
📝 Abstract
Feed-forward 3D Gaussian Splatting (3DGS) has recently demonstrated promising results for novel view synthesis (NVS) from sparse input views, particularly under narrow-baseline conditions. However, its performance significantly degrades in wide-baseline scenarios due to limited texture details and geometric inconsistencies across views. To address these challenges, in this paper, we propose ProSplat, a two-stage feed-forward framework designed for high-fidelity rendering under wide-baseline conditions. The first stage involves generating 3D Gaussian primitives via a 3DGS generator. In the second stage, rendered views from these primitives are enhanced through an improvement model. Specifically, this improvement model is based on a one-step diffusion model, further optimized by our proposed Maximum Overlap Reference view Injection (MORI) and Distance-Weighted Epipolar Attention (DWEA). MORI supplements missing texture and color by strategically selecting a reference view with maximum viewpoint overlap, while DWEA enforces geometric consistency using epipolar constraints. Additionally, we introduce a divide-and-conquer training strategy that aligns data distributions between the two stages through joint optimization. We evaluate ProSplat on the RealEstate10K and DL3DV-10K datasets under wide-baseline settings. Experimental results demonstrate that ProSplat achieves an average improvement of 1 dB in PSNR compared to recent SOTA methods.