🤖 AI Summary
This study investigates the zero-shot generalization capability of deep reinforcement learning (DRL) policies across visually and topographically distinct domains—specifically, whether navigation policies trained in terrestrial agricultural environments can be directly deployed in lunar-simulated terrain without adaptation. Using the Proximal Policy Optimization (PPO) algorithm, we train end-to-end vision-based navigation policies in a high-fidelity 3D simulator to perform goal-directed navigation and dynamic obstacle avoidance, with no fine-tuning or domain adaptation. Experimental evaluation demonstrates that the policy achieves nearly 50% task success rate in the lunar simulation environment, confirming the cross-planetary transferability of terrestrial-trained models. To our knowledge, this is the first empirical validation of a low-retraining-cost, highly generalizable autonomous navigation paradigm for planetary exploration. The results establish a scalable DRL framework for deploying resilient, vision-guided robotic systems in deep-space missions.
📝 Abstract
Autonomous navigation in unstructured environments is essential for field and planetary robotics, where robots must efficiently reach goals while avoiding obstacles under uncertain conditions. Conventional algorithmic approaches often require extensive environment-specific tuning, limiting scalability to new domains. Deep Reinforcement Learning (DRL) provides a data-driven alternative, allowing robots to acquire navigation strategies through direct interactions with their environment. This work investigates the feasibility of DRL policy generalization across visually and topographically distinct simulated domains, where policies are trained in terrestrial settings and validated in a zero-shot manner in extraterrestrial environments. A 3D simulation of an agricultural rover is developed and trained using Proximal Policy Optimization (PPO) to achieve goal-directed navigation and obstacle avoidance in farmland settings. The learned policy is then evaluated in a lunar-like simulated environment to assess transfer performance. The results indicate that policies trained under terrestrial conditions retain a high level of effectiveness, achieving close to 50% success in lunar simulations without the need for additional training and fine-tuning. This underscores the potential of cross-domain DRL-based policy transfer as a promising approach to developing adaptable and efficient autonomous navigation for future planetary exploration missions, with the added benefit of minimizing retraining costs.