Emergent Extreme-View Geometry in 3D Foundation Models

📅 2025-11-27

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

This work investigates the emergent geometric understanding of 3D foundation models (3DFMs) under extreme, non-overlapping viewpoints—without explicit training for such scenarios. We discover that their internal representations spontaneously capture structural geometry under extreme viewing conditions. To leverage this property, we propose a lightweight alignment method: only a small subset of bias parameters in the backbone is fine-tuned, while depth, point cloud, and pose prediction heads remain entirely frozen—eliminating decoder involvement and ensuring computational efficiency. To enable systematic evaluation, we introduce MegaUnScene, the first benchmark tailored to extreme-view geometry in real-world internet scenes. Experiments demonstrate substantial improvements in relative pose estimation accuracy, without degrading single-image depth or point cloud reconstruction quality. Validation on MegaUnScene confirms strong generalization and practical utility of the approach.

Technology Category

Application Category

📝 Abstract

3D foundation models (3DFMs) have recently transformed 3D vision, enabling joint prediction of depths, poses, and point maps directly from images. Yet their ability to reason under extreme, non-overlapping views remains largely unexplored. In this work, we study their internal representations and find that 3DFMs exhibit an emergent understanding of extreme-view geometry, despite never being trained for such conditions. To further enhance these capabilities, we introduce a lightweight alignment scheme that refines their internal 3D representation by tuning only a small subset of backbone bias terms, leaving all decoder heads frozen. This targeted adaptation substantially improves relative pose estimation under extreme viewpoints without degrading per-image depth or point quality. Additionally, we contribute MegaUnScene, a new benchmark of Internet scenes unseen by existing 3DFMs, with dedicated test splits for both relative pose estimation and dense 3D reconstruction. All code and data will be released.

Problem

Research questions and friction points this paper is trying to address.

Enhancing 3D foundation models for extreme-view geometry reasoning

Improving relative pose estimation under non-overlapping viewpoints

Introducing a benchmark for unseen scene 3D reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight alignment scheme tunes backbone bias terms

Frozen decoder heads maintain depth and point quality

MegaUnScene benchmark tests extreme-view geometry

🔎 Similar Papers

Survey on Modeling of Human-made Articulated Objects