TACO: General Acrobatic Flight Control via Target-and-Command-Oriented Reinforcement Learning

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor policy generalization and inflexible online adjustment of task parameters in UAV aerobatic flight, this paper proposes a goal-and-instruction-coordinated reinforcement learning (RL) framework. We introduce a novel dual-driven RL paradigm—jointly conditioned on high-level goals and low-level instructions—featuring a goal-conditioned policy network and an instruction embedding mechanism. To enhance spatiotemporal smoothness, parameter independence, and symmetry of learned policies, we incorporate spectral normalization together with dynamic input/output rescaling, significantly narrowing the sim-to-real gap. Built upon the PPO algorithm, our method is validated in high-fidelity aerodynamic simulation and on a real quadrotor platform. Experiments demonstrate sustained circular flight at speeds exceeding 8 m/s, ≥5 consecutive multi-axis flips, task-switching latency under 100 ms, and cross-task generalization success rate above 92%.

Technology Category

Application Category

📝 Abstract
Although acrobatic flight control has been studied extensively, one key limitation of the existing methods is that they are usually restricted to specific maneuver tasks and cannot change flight pattern parameters online. In this work, we propose a target-and-command-oriented reinforcement learning (TACO) framework, which can handle different maneuver tasks in a unified way and allows online parameter changes. Additionally, we propose a spectral normalization method with input-output rescaling to enhance the policy's temporal and spatial smoothness, independence, and symmetry, thereby overcoming the sim-to-real gap. We validate the TACO approach through extensive simulation and real-world experiments, demonstrating its capability to achieve high-speed circular flights and continuous multi-flips.
Problem

Research questions and friction points this paper is trying to address.

Overcoming limitations in acrobatic flight control adaptability.
Enabling online parameter changes for diverse maneuver tasks.
Reducing sim-to-real gap with enhanced policy smoothness.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Target-and-command-oriented reinforcement learning framework
Spectral normalization with input-output rescaling
Online parameter changes for diverse maneuver tasks
🔎 Similar Papers
No similar papers found.
Z
Zikang Yin
WINDY Lab, Department of Artificial Intelligence, Westlake University, Hangzhou, China
C
Canlun Zheng
WINDY Lab, Department of Artificial Intelligence, Westlake University, Hangzhou, China
S
Shiliang Guo
WINDY Lab, Department of Artificial Intelligence, Westlake University, Hangzhou, China
Zhikun Wang
Zhikun Wang
Google
Machine LearningArtificial IntelligenceRobotics
S
Shiyu Zhao
WINDY Lab, Department of Artificial Intelligence, Westlake University, Hangzhou, China