The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images without Any 3D Knowledge

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses novel view synthesis from sparse, unposed 2D images—without camera pose annotations, 3D priors, or explicit geometric representations (e.g., NeRF or 3D Gaussian Splatting). We propose the first purely 2D end-to-end neural rendering framework, which learns implicit geometry and radiance fields via large-scale self-supervised image reconstruction, entirely eliminating 3D inductive biases and pose dependencies. Our key contribution is the empirical discovery of an inverse relationship: *the less the method relies on 3D knowledge, the greater the performance gain from data scaling*—establishing a new “de-3Dified” paradigm for view synthesis. Remarkably, our approach achieves high-fidelity, geometrically consistent novel views without any input pose information, matching state-of-the-art methods that require precise camera poses. This constitutes the first rigorous empirical validation of fully data-driven, pose-free novel view synthesis.

Technology Category

Application Category

📝 Abstract
We consider the problem of generalizable novel view synthesis (NVS), which aims to generate photorealistic novel views from sparse or even unposed 2D images without per-scene optimization. This task remains fundamentally challenging, as it requires inferring 3D structure from incomplete and ambiguous 2D observations. Early approaches typically rely on strong 3D knowledge, including architectural 3D inductive biases (e.g., embedding explicit 3D representations, such as NeRF or 3DGS, into network design) and ground-truth camera poses for both input and target views. While recent efforts have sought to reduce the 3D inductive bias or the dependence on known camera poses of input views, critical questions regarding the role of 3D knowledge and the necessity of circumventing its use remain under-explored. In this work, we conduct a systematic analysis on the 3D knowledge and uncover a critical trend: the performance of methods that requires less 3D knowledge accelerates more as data scales, eventually achieving performance on par with their 3D knowledge-driven counterparts, which highlights the increasing importance of reducing dependence on 3D knowledge in the era of large-scale data. Motivated by and following this trend, we propose a novel NVS framework that minimizes 3D inductive bias and pose dependence for both input and target views. By eliminating this 3D knowledge, our method fully leverages data scaling and learns implicit 3D awareness directly from sparse 2D images, without any 3D inductive bias or pose annotation during training. Extensive experiments demonstrate that our model generates photorealistic and 3D-consistent novel views, achieving even comparable performance with methods that rely on posed inputs, thereby validating the feasibility and effectiveness of our data-centric paradigm. Project page: https://pku-vcl-geometry.github.io/Less3Depend/ .
Problem

Research questions and friction points this paper is trying to address.

Synthesizing novel views from sparse, unposed 2D images
Reducing dependence on 3D knowledge and pose annotations
Achieving 3D-consistent results without explicit 3D representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Minimizes 3D inductive bias and pose dependence
Learns implicit 3D awareness from sparse 2D images
Achieves performance without 3D knowledge or pose annotation
🔎 Similar Papers
No similar papers found.