PoseFM: Relative Camera Pose Estimation Through Flow Matching

📅 2026-04-24
📈 Citations: 0
Influential: 0
📄 PDF

career value

199K/year
🤖 AI Summary
This work addresses the instability of monocular visual odometry (VO) in low-texture or low-light environments and the prevalent lack of uncertainty modeling in existing deep learning approaches. It introduces flow matching into monocular VO for the first time, reframing inter-frame pose estimation as a generative task. By leveraging continuous-time ordinary differential equations, the method generates plausible distributions of relative poses from noise, enabling principled uncertainty-aware inference. Evaluated on the TartanAir, KITTI, and TUM-RGBD benchmarks, the approach achieves competitive performance with state-of-the-art frame-to-frame monocular VO methods, attaining the lowest absolute trajectory error (ATE) on several sequences and significantly enhancing robustness in challenging environments.

Technology Category

Application Category

📝 Abstract
Monocular visual odometry (VO) is a fundamental computer vision problem with applications in autonomous navigation, augmented reality and more. While deep learning-based methods have recently shown superior accuracy compared to traditional geometric pipelines, particularly in environments where handcrafted features struggle due to poor structure or lighting conditions, most rely on deterministic regression, which lacks the uncertainty awareness required for robust applications. We propose PoseFM, the first framework to reformulate monocular frame-to-frame VO as a generative task using Flow Matching (FM). By leveraging FM, we model camera motion as a distribution rather than a point estimate, learning to transform noise into realistic pose predictions via continuous-time ODEs. This approach provides a principled mechanism for uncertainty estimation and enables robust motion inference under challenging visual conditions. In our evaluations, PoseFM achieves strong performance on TartanAir, KITTI and TUM-RGBD benchmarks, achieving the lowest absolute trajectory error (ATE) on some of the trajectories and overall being competitive with the best frame-to-frame monocular VO methods. Code and model checkpoints will be made available at https://github.com/helsinki-sda-group/posefm.
Problem

Research questions and friction points this paper is trying to address.

monocular visual odometry
uncertainty estimation
camera pose estimation
robust motion inference
challenging visual conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow Matching
Visual Odometry
Uncertainty Estimation
Generative Modeling
Monocular Pose Estimation
🔎 Similar Papers
No similar papers found.