Monocular Depth Estimation with Global-Aware Discretization and Local Context Modeling

πŸ“… 2025-08-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Monocular depth estimation suffers from inherent depth ambiguity due to the ill-posed nature of single-view projection. To address this, we propose a local–global collaborative modeling framework. Specifically, we design a Gated Large-Kernel Attention Module (GLKAM) that fuses multi-scale local structural cues via large-kernel convolutions and gated attention; additionally, we introduce a Global Bucket Prediction Module (GBPM) that explicitly models depth distribution priors to enforce global consistency. Joint optimization of these modules mitigates both local detail distortion and global structural misprediction. Our method achieves state-of-the-art performance on NYU-V2 and KITTI benchmarks, delivering significant improvements in accuracy and robustness. Experimental results validate the effectiveness of synergistically enhancing local contextual representation while guiding estimation with global prior knowledge.

Technology Category

Application Category

πŸ“ Abstract
Accurate monocular depth estimation remains a challenging problem due to the inherent ambiguity that stems from the ill-posed nature of recovering 3D structure from a single view, where multiple plausible depth configurations can produce identical 2D projections. In this paper, we present a novel depth estimation method that combines both local and global cues to improve prediction accuracy. Specifically, we propose the Gated Large Kernel Attention Module (GLKAM) to effectively capture multi-scale local structural information by leveraging large kernel convolutions with a gated mechanism. To further enhance the global perception of the network, we introduce the Global Bin Prediction Module (GBPM), which estimates the global distribution of depth bins and provides structural guidance for depth regression. Extensive experiments on the NYU-V2 and KITTI dataset demonstrate that our method achieves competitive performance and outperforms existing approaches, validating the effectiveness of each proposed component.
Problem

Research questions and friction points this paper is trying to address.

Monocular depth estimation from single-view ambiguity
Improving accuracy with local and global cues
Enhancing global perception via depth bin distribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gated Large Kernel Attention Module captures multi-scale local structures
Global Bin Prediction Module enhances global depth perception
Combines local and global cues for accurate depth estimation
πŸ”Ž Similar Papers
No similar papers found.
H
Heng Wu
School of Computer Science and Technology, East China Normal University, Shanghai, China
Q
Qian Zhang
School of Computer Science and Technology, East China Normal University, Shanghai, China
Guixu Zhang
Guixu Zhang
East China Normal University
Image ProcessingDeep Learning