π€ AI Summary
Agricultural robots commonly rely on manual intervention or track-based navigation, and monocular vision inherently limits 3D spatial perception. Method: This paper proposes the first end-to-end monocular depth-enhanced Vision-Language Navigation (VLN) framework for agriculture. Its core innovation is a lightweight Monocular Depth Estimation (MDE) module that implicitly encodes RGB images into depth-aware features, enabling cross-modal alignment and fusion with natural language instructions. Contribution/Results: Evaluated on the A2A agricultural VLN benchmark, our method improves navigation success rate from 0.23 to 0.32 and reduces path deviation from 4.43 m to 4.08 mβachieving state-of-the-art performance in agricultural VLN. By eliminating reliance on external infrastructure or stereo/depth sensors, the framework offers a deployable, resource-efficient autonomous navigation paradigm tailored for monocular agricultural robots operating in geometrically complex field environments.
π Abstract
Agricultural robots are serving as powerful assistants across a wide range of agricultural tasks, nevertheless, still heavily relying on manual operations or railway systems for movement. The AgriVLN method and the A2A benchmark pioneeringly extend Vision-and-Language Navigation (VLN) to the agricultural domain, enabling a robot to navigate to a target position following a natural language instruction. Unlike human binocular vision, most agricultural robots are only given a single camera for monocular vision, which results in limited spatial perception. To bridge this gap, we present the method of Agricultural Vision-and-Language Navigation with Monocular Depth Estimation (MDE-AgriVLN), in which we propose the MDE module generating depth features from RGB images, to assist the decision-maker on reasoning. When evaluated on the A2A benchmark, our MDE-AgriVLN method successfully increases Success Rate from 0.23 to 0.32 and decreases Navigation Error from 4.43m to 4.08m, demonstrating the state-of-the-art performance in the agricultural VLN domain. Code: https://github.com/AlexTraveling/MDE-AgriVLN.