🤖 AI Summary
This work addresses the challenge of monocular depth estimation in unstructured outdoor environments, where scale ambiguity, sparse textures, and the scarcity of ground-truth depth annotations hinder robust robotic perception. To overcome these limitations without relying on real depth labels, the authors propose a self-supervised depth completion method that leverages structure-from-motion and novel view synthesis to construct high-fidelity textured 3D meshes, generating photorealistic synthetic data. A lightweight network is then trained using this data, guided by extremely sparse depth measurements from low-cost sensors, enabling cross-domain generalization for dense metric depth prediction. The entire system is deployed on an NVIDIA Jetson AGX Orin platform, achieving end-to-end inference in just 53 milliseconds across diverse real-world outdoor scenes, thus balancing accuracy and real-time performance.
📝 Abstract
Autonomous field robots operating in unstructured environments require robust perception to ensure safe and reliable operations. Recent advances in monocular depth estimation have demonstrated the potential of low-cost cameras as depth sensors; however, their adoption in field robotics remains limited due to the absence of reliable scale cues, ambiguous or low-texture conditions, and the scarcity of large-scale datasets. To address these challenges, we propose a depth completion model that trains on synthetic data and uses extremely sparse measurements from depth sensors to predict dense metric depth in unseen field robotics environments. A synthetic dataset generation pipeline tailored to field robotics enables the creation of multiple realistic datasets for training purposes. This dataset generation approach utilizes textured 3D meshes from Structure from Motion and photorealistic rendering with novel viewpoint synthesis to simulate diverse field robotics scenarios. Our approach achieves an end-to-end latency of 53 ms per frame on a Nvidia Jetson AGX Orin, enabling real-time deployment on embedded platforms. Extensive evaluation demonstrates competitive performance across diverse real-world field robotics scenarios.