🤖 AI Summary
To address the low accuracy of monocular depth estimation in unstructured agricultural environments (e.g., orchards and vineyards) and the poor generalizability of existing urban-scene-driven methods, this paper proposes the first metric-scale monocular depth estimation framework tailored for primary industries. Methodologically, we introduce a consistency regularization mechanism between dense depth maps and sparse point clouds, integrated with scene-adaptive data modeling and retraining strategies to enhance robustness against challenging conditions—including variable illumination, texture scarcity, and occlusion by foliage. Experimental results on a real-world orchard dataset demonstrate a significant reduction in RMSE from 1.5337 to 0.6738—a performance improvement exceeding 56% over baseline methods. This work bridges a critical gap in high-precision monocular depth estimation for agricultural applications and establishes a reliable perception foundation for intelligent harvesting, precision spraying, and other agritech tasks.
📝 Abstract
Monocular depth estimation is a rudimentary task in robotic perception. Recently, with the development of more accurate and robust neural network models and different types of datasets, monocular depth estimation has significantly improved performance and efficiency. However, most of the research in this area focuses on very concentrated domains. In particular, most of the benchmarks in outdoor scenarios belong to urban environments for the improvement of autonomous driving devices, and these benchmarks have a massive disparity with the orchard/vineyard environment, which is hardly helpful for research in the primary industry. Therefore, we propose OrchardDepth, which fills the gap in the estimation of the metric depth of the monocular camera in the orchard/vineyard environment. In addition, we present a new retraining method to improve the training result by monitoring the consistent regularization between dense depth maps and sparse points. Our method improves the RMSE of depth estimation in the orchard environment from 1.5337 to 0.6738, proving our method's validation.