🤖 AI Summary
To address low data efficiency, slow convergence, and suboptimal easy-to-hard sample scheduling in reinforcement learning–based Unmanned Aerial Vehicle Vision-and-Language Navigation (UAV VLN), this paper proposes a Semantic-Aware Gaussian Curriculum Learning framework. The method introduces a semantic difficulty evaluator that quantifies sample difficulty via visual–language modality alignment, and a Gaussian curriculum scheduler that enables smooth, adaptive adjustment of the sampling distribution from easy to hard. By seamlessly integrating vision-language models, reinforcement learning, and curriculum learning, the framework achieves significant improvements on the CityNav benchmark: 32% faster convergence, a 4.7% absolute gain in final navigation success rate, and enhanced generalization and training stability. It is compatible with multi-scale models and applicable to real-world scenarios such as intelligent inspection and urban surveillance.
📝 Abstract
Unmanned Aerial Vehicle (UAV) Vision-Language Navigation (VLN) aims to enable agents to accurately localize targets and plan flight paths in complex environments based on natural language instructions, with broad applications in intelligent inspection, disaster rescue, and urban monitoring. Recent progress in Vision-Language Models (VLMs) has provided strong semantic understanding for this task, while reinforcement learning (RL) has emerged as a promising post-training strategy to further improve generalization. However, existing RL methods often suffer from inefficient use of training data, slow convergence, and insufficient consideration of the difficulty variation among training samples, which limits further performance improvement. To address these challenges, we propose extbf{Semantic-Aware Gaussian Curriculum Scheduling (SA-GCS)}, a novel training framework that systematically integrates Curriculum Learning (CL) into RL. SA-GCS employs a Semantic-Aware Difficulty Estimator (SA-DE) to quantify the complexity of training samples and a Gaussian Curriculum Scheduler (GCS) to dynamically adjust the sampling distribution, enabling a smooth progression from easy to challenging tasks. This design significantly improves training efficiency, accelerates convergence, and enhances overall model performance. Extensive experiments on the CityNav benchmark demonstrate that SA-GCS consistently outperforms strong baselines across all metrics, achieves faster and more stable convergence, and generalizes well across models of different scales, highlighting its robustness and scalability. The implementation of our approach is publicly available.