MDE-AgriVLN: Agricultural Vision-and-Language Navigation with Monocular Depth Estimation

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

Agricultural robots commonly rely on manual intervention or track-based navigation, and monocular vision inherently limits 3D spatial perception. Method: This paper proposes the first end-to-end monocular depth-enhanced Vision-Language Navigation (VLN) framework for agriculture. Its core innovation is a lightweight Monocular Depth Estimation (MDE) module that implicitly encodes RGB images into depth-aware features, enabling cross-modal alignment and fusion with natural language instructions. Contribution/Results: Evaluated on the A2A agricultural VLN benchmark, our method improves navigation success rate from 0.23 to 0.32 and reduces path deviation from 4.43 m to 4.08 m—achieving state-of-the-art performance in agricultural VLN. By eliminating reliance on external infrastructure or stereo/depth sensors, the framework offers a deployable, resource-efficient autonomous navigation paradigm tailored for monocular agricultural robots operating in geometrically complex field environments.

Technology Category

Application Category

📝 Abstract

Agricultural robots are serving as powerful assistants across a wide range of agricultural tasks, nevertheless, still heavily relying on manual operations or railway systems for movement. The AgriVLN method and the A2A benchmark pioneeringly extend Vision-and-Language Navigation (VLN) to the agricultural domain, enabling a robot to navigate to a target position following a natural language instruction. Unlike human binocular vision, most agricultural robots are only given a single camera for monocular vision, which results in limited spatial perception. To bridge this gap, we present the method of Agricultural Vision-and-Language Navigation with Monocular Depth Estimation (MDE-AgriVLN), in which we propose the MDE module generating depth features from RGB images, to assist the decision-maker on reasoning. When evaluated on the A2A benchmark, our MDE-AgriVLN method successfully increases Success Rate from 0.23 to 0.32 and decreases Navigation Error from 4.43m to 4.08m, demonstrating the state-of-the-art performance in the agricultural VLN domain. Code: https://github.com/AlexTraveling/MDE-AgriVLN.

Problem

Research questions and friction points this paper is trying to address.

Agricultural robots lack spatial perception with monocular vision.

MDE-AgriVLN integrates depth estimation to improve navigation accuracy.

It enhances Vision-and-Language Navigation for agricultural tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Monocular depth estimation enhances spatial perception for navigation

Depth features from RGB images assist decision-making in navigation

Method improves success rate and reduces navigation error in agriculture

🔎 Similar Papers

No similar papers found.