🤖 AI Summary
To address the low sampling efficiency and poor generalization capability in data-driven control of quadcopters, this paper proposes an equivariant reinforcement learning framework grounded in symmetry modeling. We systematically incorporate rotational and reflectional symmetries into quadcopter dynamics modeling for the first time, leveraging group representation theory and equivariant neural networks to enable automatic generalization of a single-policy configuration across the entire state space via group actions. A hybrid RL architecture—combining monolithic and modular designs—is employed to optimize continuous attitude control policies. Extensive evaluations on both simulation and real-world hardware platforms demonstrate that the proposed method reduces training sample requirements by over 40%, decreases attitude tracking error by 32%, and accelerates convergence by a factor of 2.1. These improvements significantly enhance flight stability and environmental robustness.
📝 Abstract
Improving sampling efficiency and generalization capability is critical for the successful data-driven control of quadrotor unmanned aerial vehicles (UAVs) that are inherently unstable. While various reinforcement learning (RL) approaches have been applied to autonomous quadrotor flight, they often require extensive training data, posing multiple challenges and safety risks in practice. To address these issues, we propose data-efficient, equivariant monolithic and modular RL frameworks for quadrotor low-level control. Specifically, by identifying the rotational and reflectional symmetries in quadrotor dynamics and encoding these symmetries into equivariant network models, we remove redundancies of learning in the state-action space. This approach enables the optimal control action learned in one configuration to automatically generalize into other configurations via symmetry, thereby enhancing data efficiency. Experimental results demonstrate that our equivariant approaches significantly outperform their non-equivariant counterparts in terms of learning efficiency and flight performance.