Splat2Real: Novel-view Scaling for Physical AI with 3D Gaussian Splatting

πŸ“… 2026-03-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the robustness degradation of monocular RGB-to-3D perception in physical AI systems caused by viewpoint shifts between training and deployment. To mitigate this, the authors propose a digital twin–guided imitation learning framework that leverages 3D Gaussian Splatting to synthesize scalable novel views and trains a student network using depth and visibility supervision from an expert. Key innovations include the CN-Coverage curriculum strategy, which selects high-value viewpoints by balancing geometric gain against extrapolation penalty, and the GOL-Gated quality-aware fallback mechanism that ensures training stability under unreliable teacher signals. Experiments on the TUM RGB-D dataset demonstrate that the method effectively alleviates performance drop under viewpoint shift, achieving state-of-the-art results particularly under medium to high data budgets, and shows practical deployment value in downstream control tasks.

Technology Category

Application Category

πŸ“ Abstract
Physical AI faces viewpoint shift between training and deployment, and novel-view robustness is essential for monocular RGB-to-3D perception. We cast Real2Render2Real monocular depth pretraining as imitation-learning-style supervision from a digital twin oracle: a student depth network imitates expert metric depth/visibility rendered from a scene mesh, while 3DGS supplies scalable novel-view observations. We present Splat2Real, centered on novel-view scaling: performance depends more on which views are added than on raw view count. We introduce CN-Coverage, a coverage+novelty curriculum that greedily selects views by geometry gain and an extrapolation penalty, plus a quality-aware guardrail fallback for low-reliability teachers. Across 20 TUM RGB-D sequences with step-matched budgets (N=0 to 2000 additional rendered views, with N unique<= 500 and resampling for larger budgets), naive scaling is unstable; CN-Coverage mitigates worst-case regressions relative to Robot/Coverage policies, and GOL-Gated CN-Coverage provides the strongest medium-high-budget stability with the lowest high-novelty tail error. Downstream control-proxy results versus N provides embodied-relevance evidence by shifting safety/progress trade-offs under viewpoint shift.
Problem

Research questions and friction points this paper is trying to address.

viewpoint shift
novel-view robustness
monocular RGB-to-3D perception
Physical AI
3D perception
Innovation

Methods, ideas, or system contributions that make the work stand out.

novel-view scaling
3D Gaussian Splatting
imitation learning
coverage curriculum
physical AI