🤖 AI Summary
To address the insufficient real-time adaptability and fault tolerance of conventional PD controllers and mainstream deep reinforcement learning (DRL) algorithms (TD3, PPO, A2C) in autonomous satellite attitude control under reaction wheel (RW) failures, this paper proposes TD3-HD—a novel DRL method integrating Hindsight Experience Replay (HER) into the TD3 framework to mitigate sparse reward challenges, and introducing Dimension-wise Clipping (DWC) to enable precise fault-state perception and robust policy adaptation. TD3-HD significantly enhances control resilience and stability in dynamic, uncertain environments. Experimental results demonstrate that, compared to baseline methods, TD3-HD reduces attitude angle error by 42.3% and angular velocity overshoot by 58.7%. Moreover, it maintains stable convergence even under complete single-wheel failure, validating its effectiveness and advancement for on-orbit autonomous fault-tolerant spacecraft attitude control.
📝 Abstract
Reliable satellite attitude control is essential for the success of space missions, particularly as satellites increasingly operate autonomously in dynamic and uncertain environments. Reaction wheels (RWs) play a pivotal role in attitude control, and maintaining control resilience during RW faults is critical to preserving mission objectives and system stability. However, traditional Proportional Derivative (PD) controllers and existing deep reinforcement learning (DRL) algorithms such as TD3, PPO, and A2C often fall short in providing the real time adaptability and fault tolerance required for autonomous satellite operations. This study introduces a DRL-based control strategy designed to improve satellite resilience and adaptability under fault conditions. Specifically, the proposed method integrates Twin Delayed Deep Deterministic Policy Gradient (TD3) with Hindsight Experience Replay (HER) and Dimension Wise Clipping (DWC) referred to as TD3-HD to enhance learning in sparse reward environments and maintain satellite stability during RW failures. The proposed approach is benchmarked against PD control and leading DRL algorithms. Experimental results show that TD3-HD achieves significantly lower attitude error, improved angular velocity regulation, and enhanced stability under fault conditions. These findings underscore the proposed method potential as a powerful, fault tolerant, onboard AI solution for autonomous satellite attitude control.