🤖 AI Summary
To address poor policy generalization and inflexible online adjustment of task parameters in UAV aerobatic flight, this paper proposes a goal-and-instruction-coordinated reinforcement learning (RL) framework. We introduce a novel dual-driven RL paradigm—jointly conditioned on high-level goals and low-level instructions—featuring a goal-conditioned policy network and an instruction embedding mechanism. To enhance spatiotemporal smoothness, parameter independence, and symmetry of learned policies, we incorporate spectral normalization together with dynamic input/output rescaling, significantly narrowing the sim-to-real gap. Built upon the PPO algorithm, our method is validated in high-fidelity aerodynamic simulation and on a real quadrotor platform. Experiments demonstrate sustained circular flight at speeds exceeding 8 m/s, ≥5 consecutive multi-axis flips, task-switching latency under 100 ms, and cross-task generalization success rate above 92%.
📝 Abstract
Although acrobatic flight control has been studied extensively, one key limitation of the existing methods is that they are usually restricted to specific maneuver tasks and cannot change flight pattern parameters online. In this work, we propose a target-and-command-oriented reinforcement learning (TACO) framework, which can handle different maneuver tasks in a unified way and allows online parameter changes. Additionally, we propose a spectral normalization method with input-output rescaling to enhance the policy's temporal and spatial smoothness, independence, and symmetry, thereby overcoming the sim-to-real gap. We validate the TACO approach through extensive simulation and real-world experiments, demonstrating its capability to achieve high-speed circular flights and continuous multi-flips.