Extrapolated Urban View Synthesis Benchmark

📅 2024-12-06
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing novel view synthesis (NVS) methods perform well under interpolation—i.e., when test views are close to training views—but exhibit severe generalization failure in extrapolation scenarios involving large viewpoint deviations (e.g., rare driving angles in autonomous vehicles). To address this gap, we introduce EUVS, the first extrapolation-oriented NVS benchmark tailored for urban scenes, constructed from multi-vehicle, multi-trip, multi-camera real-world autonomous vehicle (AV) data. EUVS systematically evaluates NVS robustness under extreme viewpoint shifts. Through quantitative and qualitative analysis, we reveal that state-of-the-art radiance field methods—including 3D Gaussian Splatting, NeRF variants, and diffusion-guided rendering—suffer from severe overfitting; neither geometric optimization nor diffusion priors effectively mitigate large-angle distortions. Crucially, we demonstrate that scaling and diversifying training data is key to improving extrapolation performance. EUVS establishes a new standard for evaluating NVS generalization and advancing robust simulation systems.

Technology Category

Application Category

📝 Abstract
Photorealistic simulators are essential for the training and evaluation of vision-centric autonomous vehicles (AVs). At their core is Novel View Synthesis (NVS), a crucial capability that generates diverse unseen viewpoints to accommodate the broad and continuous pose distribution of AVs. Recent advances in radiance fields, such as 3D Gaussian Splatting, achieve photorealistic rendering at real-time speeds and have been widely used in modeling large-scale driving scenes. However, their performance is commonly evaluated using an interpolated setup with highly correlated training and test views. In contrast, extrapolation, where test views largely deviate from training views, remains underexplored, limiting progress in generalizable simulation technology. To address this gap, we leverage publicly available AV datasets with multiple traversals, multiple vehicles, and multiple cameras to build the first Extrapolated Urban View Synthesis (EUVS) benchmark. Meanwhile, we conduct both quantitative and qualitative evaluations of state-of-the-art NVS methods across different evaluation settings. Our results show that current NVS methods are prone to overfitting to training views. Besides, incorporating diffusion priors and improving geometry cannot fundamentally improve NVS under large view changes, highlighting the need for more robust approaches and large-scale training. We will release the data to help advance self-driving and urban robotics simulation technology.
Problem

Research questions and friction points this paper is trying to address.

Evaluates NVS methods in extrapolated urban view synthesis.
Highlights overfitting issues in current NVS techniques.
Proposes a benchmark to improve AV simulation robustness.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed Extrapolated Urban View Synthesis benchmark
Utilized multiple AV datasets for diverse viewpoints
Evaluated NVS methods for robustness and overfitting
🔎 Similar Papers
No similar papers found.