Towards Precise Intent-Aligned VLA Aerial Navigation via Expert-Guided GRPO

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses key limitations of current vision-language-action (VLA) models in autonomous drone navigation—namely, scarcity of supervised fine-tuning (SFT) data, poor generalization, and difficulty in aligning with complex human instructions. To overcome these challenges, the authors propose an efficient reinforcement learning framework featuring Expert-Guided Group Relative Policy Optimization (EG-GRPO), which integrates sparse expert demonstrations into online policy updates. The approach further introduces a heterogeneous parallel simulation-inference pipeline to enable efficient exploration and incorporates a human-feedback-based reward mechanism to enhance fine-grained instruction alignment. Experimental results demonstrate that the proposed method achieves a 2.13× higher task success rate than SFT baselines, improves intent alignment by 60.9%, and reduces rollout time by 43.5% across diverse complex navigation tasks.

📝 Abstract

Vision-Language-Action (VLA) models offer a promising end-to-end paradigm for unmanned aerial vehicles (UAVs) to accomplish complex tasks specified by fine-grained instructions. However, standard supervised fine-tuning (SFT) suffers from data scarcity, limited generalization, and weak supervision for nuanced and complicated human intents. Reinforcement fine-tuning offers a natural way to mitigate these challenges and align policy behaviors with human intents through designable feedback, but applying it to aerial navigation remains challenging due to inefficient exploration in expansive continuous spaces. To address these challenges, we introduce an efficient reinforcement learning (RL) framework for VLA-based aerial navigation. At its core, we propose EG-GRPO (Expert-Guided Group Relative Policy Optimization) to augment online rollouts with few-shot expert data. Additionally, we design a heterogeneous pipeline enabling parallel simulation and inference, which reduces rollout time by 43.5%. Across multiple tasks specified by complex human intents, EG-GRPO improves the success rate to 2.13x that of the SFT baseline, while improving intent alignment performance by 60.9%. These results demonstrate that our framework can move aerial navigation toward precise intent-aligned flight.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action

aerial navigation

intent alignment

reinforcement learning

human intent

Innovation

Methods, ideas, or system contributions that make the work stand out.

Expert-Guided GRPO

Vision-Language-Action

Reinforcement Fine-tuning