π€ AI Summary
Existing wide-baseline panoramic reconstruction methods heavily rely on high-precision camera poses, whose acquisition is costly and highly sensitive to noise in real-world scenarios, severely limiting practical applicability. To address this, we propose the first pose-free panoramic reconstruction framework. Our approach pioneers the transfer of pre-trained perspective-domain models to the panoramic domain by introducing Rotary Position Embedding (RoPE) to explicitly model the horizontal periodicity inherent in panoramic images. Coupled with a coordinate-rolling strategy and self-supervised attention mechanisms, our method achieves efficient domain adaptation and strong generalization without pose supervision. Extensive experiments on multiple benchmarks demonstrate that our method significantly outperforms state-of-the-art approaches in both novel-view synthesis quality and depth estimation accuracy, validating its effectiveness and robustness under challenging real-world conditions.
π Abstract
Wide-baseline panorama reconstruction has emerged as a highly effective and pivotal approach for not only achieving geometric reconstruction of the surrounding 3D environment, but also generating highly realistic and immersive novel views. Although existing methods have shown remarkable performance across various benchmarks, they are predominantly reliant on accurate pose information. In real-world scenarios, the acquisition of precise pose often requires additional computational resources and is highly susceptible to noise. These limitations hinder the broad applicability and practicality of such methods. In this paper, we present PanoSplatt3R, an unposed wide-baseline panorama reconstruction method. We extend and adapt the foundational reconstruction pretrainings from the perspective domain to the panoramic domain, thus enabling powerful generalization capabilities. To ensure a seamless and efficient domain-transfer process, we introduce RoPE rolling that spans rolled coordinates in rotary positional embeddings across different attention heads, maintaining a minimal modification to RoPE's mechanism, while modeling the horizontal periodicity of panorama images. Comprehensive experiments demonstrate that PanoSplatt3R, even in the absence of pose information, significantly outperforms current state-of-the-art methods. This superiority is evident in both the generation of high-quality novel views and the accuracy of depth estimation, thereby showcasing its great potential for practical applications. Project page: https://npucvr.github.io/PanoSplatt3R