G2G: Exploiting Intra-Group Geometry for Inter-Group Pose Estimation

πŸ“… 2026-06-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the problem of six-degree-of-freedom relative pose estimation across image groups, with applications in cross-sequence relocalization and multi-camera odometry. Building upon a frozen, pretrained multi-view backbone, the authors propose three lightweight modules: a perception-aware resampler, a cross-group bridging module enhanced with fusion-based self-attention, and a multi-frame pose head. Trained solely with relative pose supervision, the method effectively leverages intra-group geometric structure to enable efficient cross-group inference without fine-tuning the underlying backbone. Evaluated on four diverse datasets spanning indoor and outdoor scenes, cross-seasonal conditions, and zero-shot simulation-to-real transfer, the approach achieves state-of-the-art accuracy while using fewer than 6% of the original model’s trainable parameters, substantially reducing training overhead and enhancing generalization.
πŸ“ Abstract
Recovering the relative 6-DoF pose between two image groups underlies cross-sequence relocalization and multi-camera rig odometry. Each group carries known intra-group geometry from visual odometry or rig calibration, and pretrained multi-view backbones already fuse such geometry into visual features. Yet current models treat all views as an unstructured set, leaving cross-group reasoning as the missing piece. We introduce \ours{}, which keeps the foundation model entirely frozen and adds three lightweight trainable modules to bridge the two groups: a perceiver resampler, a cross-group bridge with merged self-attention, and a multi-frame pose head. The trainable footprint totals about 32M parameters, under 6\% of the full model, and is supervised only by relative poses. Across four datasets that span indoor and outdoor simulation, real-world cross-season capture, and zero-shot sim-to-real transfer, \ours{} attains state-of-the-art accuracy on both tasks, while every baseline is retrained with its full original supervision. Code is available at https://github.com/WeiYuFei0217/G2G.
Problem

Research questions and friction points this paper is trying to address.

pose estimation
intra-group geometry
inter-group reasoning
6-DoF pose
multi-view
Innovation

Methods, ideas, or system contributions that make the work stand out.

intra-group geometry
cross-group pose estimation
frozen foundation model
lightweight trainable modules
6-DoF pose
Y
Yufei Wei
State Key Laboratory of Industrial Control and Technology, Zhejiang University, Hangzhou, China
S
Shuhao Ye
State Key Laboratory of Industrial Control and Technology, Zhejiang University, Hangzhou, China
C
Chenxiao Hu
State Key Laboratory of Industrial Control and Technology, Zhejiang University, Hangzhou, China
Yiyuan Pan
Yiyuan Pan
Carnegie Mellon University
Robot LearningMultimodal LearningReinforcement Learning
Dongyu Feng
Dongyu Feng
Pacific Northwest National Laboratory
HydrologyHydrodynamic modelingcontaminant transport modelingcoastal processesoil spills
Rong Xiong
Rong Xiong
Zhejiang University
Robotics
Yue Wang
Yue Wang
Zhejiang University
Robot LearningNavigationManipulation
Yanmei Jiao
Yanmei Jiao
Hangzhou Normal University
visual localization