PanoSplatt3R: Leveraging Perspective Pretraining for Generalized Unposed Wide-Baseline Panorama Reconstruction

πŸ“… 2025-07-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing wide-baseline panoramic reconstruction methods heavily rely on high-precision camera poses, whose acquisition is costly and highly sensitive to noise in real-world scenarios, severely limiting practical applicability. To address this, we propose the first pose-free panoramic reconstruction framework. Our approach pioneers the transfer of pre-trained perspective-domain models to the panoramic domain by introducing Rotary Position Embedding (RoPE) to explicitly model the horizontal periodicity inherent in panoramic images. Coupled with a coordinate-rolling strategy and self-supervised attention mechanisms, our method achieves efficient domain adaptation and strong generalization without pose supervision. Extensive experiments on multiple benchmarks demonstrate that our method significantly outperforms state-of-the-art approaches in both novel-view synthesis quality and depth estimation accuracy, validating its effectiveness and robustness under challenging real-world conditions.

Technology Category

Application Category

πŸ“ Abstract
Wide-baseline panorama reconstruction has emerged as a highly effective and pivotal approach for not only achieving geometric reconstruction of the surrounding 3D environment, but also generating highly realistic and immersive novel views. Although existing methods have shown remarkable performance across various benchmarks, they are predominantly reliant on accurate pose information. In real-world scenarios, the acquisition of precise pose often requires additional computational resources and is highly susceptible to noise. These limitations hinder the broad applicability and practicality of such methods. In this paper, we present PanoSplatt3R, an unposed wide-baseline panorama reconstruction method. We extend and adapt the foundational reconstruction pretrainings from the perspective domain to the panoramic domain, thus enabling powerful generalization capabilities. To ensure a seamless and efficient domain-transfer process, we introduce RoPE rolling that spans rolled coordinates in rotary positional embeddings across different attention heads, maintaining a minimal modification to RoPE's mechanism, while modeling the horizontal periodicity of panorama images. Comprehensive experiments demonstrate that PanoSplatt3R, even in the absence of pose information, significantly outperforms current state-of-the-art methods. This superiority is evident in both the generation of high-quality novel views and the accuracy of depth estimation, thereby showcasing its great potential for practical applications. Project page: https://npucvr.github.io/PanoSplatt3R
Problem

Research questions and friction points this paper is trying to address.

Reconstructing panoramas without precise pose information
Enhancing generalization from perspective to panoramic domains
Improving novel view generation and depth estimation accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages perspective pretraining for panorama reconstruction
Introduces RoPE rolling for seamless domain-transfer
Eliminates need for precise pose information
πŸ”Ž Similar Papers
No similar papers found.
J
Jiahui Ren
School of Electronics and Information, Northwestern Polytechnical University and Shaanxi Key Laboratory of Information Acquisition and Processing, Xi’an, Shaanxi, China
Mochu Xiang
Mochu Xiang
Northwestern Polytechnical University
Monocular Depth Estimation
Jiajun Zhu
Jiajun Zhu
Zhejiang University
geometric deep learninglarge multimodal modeltrustworthy machine learning
Y
Yuchao Dai
School of Electronics and Information, Northwestern Polytechnical University and Shaanxi Key Laboratory of Information Acquisition and Processing, Xi’an, Shaanxi, China