MDE-VIO: Enhancing Visual-Inertial Odometry Using Learned Depth Priors

📅 2026-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of pose estimation failure in monocular visual-inertial odometry under low-texture environments, where sparse features degrade performance. The authors propose the first method to efficiently integrate dense depth priors generated by a Vision Transformer into the VINS-Mono backend. To enhance geometric robustness, they introduce affine-invariant depth consistency and ordinal constraints, along with a variance-gated mechanism to suppress unstable artifacts. This design preserves real-time capability on edge devices while recovering metric scale consistency. Experimental results on the TartanGround and M3ED datasets demonstrate significant improvements in localization accuracy, achieving up to a 28.3% reduction in absolute trajectory error and effectively mitigating trajectory divergence in complex environments.

Technology Category

Application Category

📝 Abstract
Traditional monocular Visual-Inertial Odometry (VIO) systems struggle in low-texture environments where sparse visual features are insufficient for accurate pose estimation. To address this, dense Monocular Depth Estimation (MDE) has been widely explored as a complementary information source. While recent Vision Transformer (ViT) based complex foundational models offer dense, geometrically consistent depth, their computational demands typically preclude them from real-time edge deployment. Our work bridges this gap by integrating learned depth priors directly into the VINS-Mono optimization backend. We propose a novel framework that enforces affine-invariant depth consistency and pairwise ordinal constraints, explicitly filtering unstable artifacts via variance-based gating. This approach strictly adheres to the computational limits of edge devices while robustly recovering metric scale. Extensive experiments on the TartanGround and M3ED datasets demonstrate that our method prevents divergence in challenging scenarios and delivers significant accuracy gains, reducing Absolute Trajectory Error (ATE) by up to 28.3%. Code will be made available.
Problem

Research questions and friction points this paper is trying to address.

Visual-Inertial Odometry
Monocular Depth Estimation
low-texture environments
edge deployment
pose estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual-Inertial Odometry
Learned Depth Priors
Vision Transformer
Affine-Invariant Depth Consistency
Edge Deployment
🔎 Similar Papers
No similar papers found.
A
Arda Alnıak
Dept. of Electrical & Electronics Engineering, Center for Image Analysis (OGAM), METU, Ankara, Türkiye
Sinan Kalkan
Sinan Kalkan
Dept. of Computer Eng., Middle East Technical University
Computer VisionDeep LearningRobotics
M
M. Mert Ankaralı
Dept. of Electrical & Electronics Engineering, Center for Image Analysis (OGAM), METU, Ankara, Türkiye
A
Afsar Saranlı
Dept. of Electrical & Electronics Engineering, Center for Image Analysis (OGAM), METU, Ankara, Türkiye
A
A. Aydın Alatan
Dept. of Electrical & Electronics Engineering, Center for Image Analysis (OGAM), METU, Ankara, Türkiye