Leveraging Consistent Spatio-Temporal Correspondence for Robust Visual Odometry

📅 2024-12-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the accumulation of pose estimation errors in visual odometry (VO) under long videos and complex scenes—caused by high optical flow matching noise and temporal inconsistency—this paper proposes STVO, a deep neural architecture for multi-frame spatiotemporal joint modeling. Its core contributions are: (1) a novel temporal propagation module that transfers motion cues across frames to enhance temporal consistency; and (2) a geometry-guided spatial activation module that incorporates depth priors to suppress optical flow noise. The method jointly integrates depth-aware perception, optical flow prediction, nonlinear bundle adjustment (BA), and depth-map-based noise filtering. Evaluated on TUM-RGBD, EuRoC MAV, ETH3D, and KITTI Odometry benchmarks, STVO achieves state-of-the-art performance: absolute trajectory error is reduced by 77.8% on ETH3D and by 38.9% on KITTI, demonstrating significant improvements in robustness and accuracy.

Technology Category

Application Category

📝 Abstract
Recent approaches to VO have significantly improved performance by using deep networks to predict optical flow between video frames. However, existing methods still suffer from noisy and inconsistent flow matching, making it difficult to handle challenging scenarios and long-sequence estimation. To overcome these challenges, we introduce Spatio-Temporal Visual Odometry (STVO), a novel deep network architecture that effectively leverages inherent spatio-temporal cues to enhance the accuracy and consistency of multi-frame flow matching. With more accurate and consistent flow matching, STVO can achieve better pose estimation through the bundle adjustment (BA). Specifically, STVO introduces two innovative components: 1) the Temporal Propagation Module that utilizes multi-frame information to extract and propagate temporal cues across adjacent frames, maintaining temporal consistency; 2) the Spatial Activation Module that utilizes geometric priors from the depth maps to enhance spatial consistency while filtering out excessive noise and incorrect matches. Our STVO achieves state-of-the-art performance on TUM-RGBD, EuRoc MAV, ETH3D and KITTI Odometry benchmarks. Notably, it improves accuracy by 77.8% on ETH3D benchmark and 38.9% on KITTI Odometry benchmark over the previous best methods.
Problem

Research questions and friction points this paper is trying to address.

Visual Odometry
Long Video
Motion Prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

STVO
Multi-frame Information
Depth-aware Estimation
🔎 Similar Papers
No similar papers found.
Zhaoxing Zhang
Zhaoxing Zhang
Huazhong university of science and technology
Visual OdometryRoboticsExploration
Junda Cheng
Junda Cheng
Huazhong University of Science and Technology
computer vision
Gangwei Xu
Gangwei Xu
Huazhong University of Science and Technology
Computer VisionDeep LearningStereo Matching
Xiaoxiang Wang
Xiaoxiang Wang
School of Electronic Information and Communications, Huazhong University of Science and Technology
C
Can Zhang
School of Electronic Information and Communications, Huazhong University of Science and Technology
X
Xin Yang
School of Electronic Information and Communications, Huazhong University of Science and Technology; Hubei Key Laboratory of Smart Internet, Huazhong University of Science and Technology