ProSplat: Improved Feed-Forward 3D Gaussian Splatting for Wide-Baseline Sparse Views

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address texture degradation and geometric inconsistency in feed-forward 3D Gaussian Splatting (3DGS) for novel view synthesis (NVS) under wide-baseline sparse-view settings, this paper proposes a two-stage feed-forward framework: an initial stage generates coarse 3D Gaussian primitives, while a subsequent stage employs a single-step diffusion-enhanced model to refine rendering quality. We introduce two key innovations: (1) Maximum Overlap Reference View Injection (MORI), which explicitly encodes multi-view geometric constraints; and (2) Distance-Weighted Epipolar Attention (DWEA), which fuses cross-view features guided by epipolar geometry. Furthermore, a divide-and-conquer joint optimization strategy ensures distribution alignment during training. Evaluated on the RealEstate10K and DL3DV-10K wide-baseline benchmarks, our method achieves an average PSNR gain of 1.0 dB over state-of-the-art approaches, demonstrating significant improvements in both geometric fidelity and texture coherence.

Technology Category

Application Category

📝 Abstract
Feed-forward 3D Gaussian Splatting (3DGS) has recently demonstrated promising results for novel view synthesis (NVS) from sparse input views, particularly under narrow-baseline conditions. However, its performance significantly degrades in wide-baseline scenarios due to limited texture details and geometric inconsistencies across views. To address these challenges, in this paper, we propose ProSplat, a two-stage feed-forward framework designed for high-fidelity rendering under wide-baseline conditions. The first stage involves generating 3D Gaussian primitives via a 3DGS generator. In the second stage, rendered views from these primitives are enhanced through an improvement model. Specifically, this improvement model is based on a one-step diffusion model, further optimized by our proposed Maximum Overlap Reference view Injection (MORI) and Distance-Weighted Epipolar Attention (DWEA). MORI supplements missing texture and color by strategically selecting a reference view with maximum viewpoint overlap, while DWEA enforces geometric consistency using epipolar constraints. Additionally, we introduce a divide-and-conquer training strategy that aligns data distributions between the two stages through joint optimization. We evaluate ProSplat on the RealEstate10K and DL3DV-10K datasets under wide-baseline settings. Experimental results demonstrate that ProSplat achieves an average improvement of 1 dB in PSNR compared to recent SOTA methods.
Problem

Research questions and friction points this paper is trying to address.

Improves 3D Gaussian Splatting for sparse wide-baseline views
Enhances texture and geometric consistency via diffusion model
Boosts rendering fidelity in wide-baseline novel view synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage feed-forward framework for wide-baseline rendering
One-step diffusion model with MORI and DWEA enhancements
Divide-and-conquer training strategy for joint optimization
🔎 Similar Papers
No similar papers found.
X
Xiaohan Lu
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University, Beijing 100871, China
J
Jiaye Fu
State Key Laboratory of Multimedia Information Processing, School of Electronic and Computer Engineering, Peking University, Shenzhen, 518055, China, and also with the National Engineering Research Center of Visual Technology, Peking University, Beijing 100871, China
J
Jiaqi Zhang
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University, Beijing 100871, China
Z
Zetian Song
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University, Beijing 100871, China
Chuanmin Jia
Chuanmin Jia
Peking University
Video CodingMultimediaData Compression
S
Siwei Ma
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University, Beijing 100871, China