AnyDepth: Depth Estimation Made Easy

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work proposes a lightweight, data-centric, zero-shot monocular depth estimation framework that addresses the inefficiency and limited generalization often caused by reliance on large-scale datasets and complex decoders. The approach employs DINOv2 as the encoder and introduces a structurally simple, single-path Simple Depth Transformer (SDT) decoder, reducing model parameters by 85%–89%. Additionally, a quality-based sample filtering strategy is incorporated to refine the training data distribution. Evaluated across five benchmark datasets, the method outperforms DPT in accuracy while substantially lowering computational overhead, demonstrating the effectiveness of jointly optimizing model compactness and data quality.

Technology Category

Application Category

📝 Abstract

Monocular depth estimation aims to recover the depth information of 3D scenes from 2D images. Recent work has made significant progress, but its reliance on large-scale datasets and complex decoders has limited its efficiency and generalization ability. In this paper, we propose a lightweight and data-centric framework for zero-shot monocular depth estimation. We first adopt DINOv3 as the visual encoder to obtain high-quality dense features. Secondly, to address the inherent drawbacks of the complex structure of the DPT, we design the Simple Depth Transformer (SDT), a compact transformer-based decoder. Compared to the DPT, it uses a single-path feature fusion and upsampling process to reduce the computational overhead of cross-scale feature fusion, achieving higher accuracy while reducing the number of parameters by approximately 85%-89%. Furthermore, we propose a quality-based filtering strategy to filter out harmful samples, thereby reducing dataset size while improving overall training quality. Extensive experiments on five benchmarks demonstrate that our framework surpasses the DPT in accuracy. This work highlights the importance of balancing model design and data quality for achieving efficient and generalizable zero-shot depth estimation. Code: https://github.com/AIGeeksGroup/AnyDepth. Website: https://aigeeksgroup.github.io/AnyDepth.

Problem

Research questions and friction points this paper is trying to address.

monocular depth estimation

zero-shot

generalization

efficiency

data quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-shot depth estimation

Simple Depth Transformer

DINOv3