A General Infrastructure and Workflow for Quadrotor Deep Reinforcement Learning and Reality Deployment

📅 2025-04-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenges of sim-to-real transfer, stringent real-time requirements, and heavy training-data dependency when deploying deep reinforcement learning (DRL) policies for quadrotor UAVs in unstructured outdoor environments, this paper introduces AirGym—the first end-to-end, reproducible, full-stack closed-loop DRL platform. AirGym unifies high-fidelity dynamics simulation, lightweight real-time inference, a ROS 2–based embedded flight controller architecture, MAVROS middleware, and a physical experimental benchmark. It enables seamless policy development—from scratch training (e.g., PPO, SAC) and simulation validation to real-world deployment. The platform achieves robust hover, dynamic obstacle avoidance, and trajectory tracking under realistic outdoor disturbances. Multi-task policy training and deployment are compressed to minutes. All code and benchmarks are publicly released.

Technology Category

Application Category

📝 Abstract
Deploying robot learning methods to a quadrotor in unstructured outdoor environments is an exciting task. Quadrotors operating in real-world environments by learning-based methods encounter several challenges: a large amount of simulator generated data required for training, strict demands for real-time processing onboard, and the sim-to-real gap caused by dynamic and noisy conditions. Current works have made a great breakthrough in applying learning-based methods to end-to-end control of quadrotors, but rarely mention the infrastructure system training from scratch and deploying to reality, which makes it difficult to reproduce methods and applications. To bridge this gap, we propose a platform that enables the seamless transfer of end-to-end deep reinforcement learning (DRL) policies. We integrate the training environment, flight dynamics control, DRL algorithms, the MAVROS middleware stack, and hardware into a comprehensive workflow and architecture that enables quadrotors' policies to be trained from scratch to real-world deployment in several minutes. Our platform provides rich types of environments including hovering, dynamic obstacle avoidance, trajectory tracking, balloon hitting, and planning in unknown environments, as a physical experiment benchmark. Through extensive empirical validation, we demonstrate the efficiency of proposed sim-to-real platform, and robust outdoor flight performance under real-world perturbations. Details can be found from our website https://emnavi.tech/AirGym/.
Problem

Research questions and friction points this paper is trying to address.

Bridging sim-to-real gap for quadrotor DRL deployment
Enabling real-time onboard processing for learned policies
Providing comprehensive training-to-deployment workflow benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

Seamless transfer of end-to-end DRL policies
Integrated training to deployment workflow
Rich environments for physical experiment benchmark
🔎 Similar Papers
2024-10-10IEEE/RJS International Conference on Intelligent RObots and SystemsCitations: 1
Kangyao Huang
Kangyao Huang
Tsinghua University
Robot LearningAerial Robotics
H
Hao Wang
School of Computer Science and Technology, Dalian University of Technology; Department of Automation, Tsinghua University
Y
Yu Luo
Department of Computer Science and Technology, Tsinghua University
Jingyu Chen
Jingyu Chen
Huazhong University of Science and Technology
Computer VisionDeep Learning3D Vision
J
Jintao Chen
Department of Automation, Tsinghua University
X
Xiangkui Zhang
School of Computer Science and Technology, Dalian University of Technology
X
Xiangyang Ji
Department of Automation, Tsinghua University
Huaping Liu
Huaping Liu
Professor of Electrical Engineering, Oregon State University
Communication theorywireless communicationssignal processingsensor networksinformation security