🤖 AI Summary
This paper addresses the robust optimal control problem for uncertain nonlinear systems. We propose a novel control architecture integrating an event-triggered mechanism (ETM), an extended state observer (ESO), and value-iteration-based adaptive dynamic programming (VI-ADP). To our knowledge, this is the first work embedding ETM directly into the ADP framework; the ESO estimates composite disturbances online, enabling disturbance compensation and closed-loop stability without requiring an exact system model. Lyapunov-based stability analysis rigorously guarantees both learning convergence and system robustness. Compared with conventional time-triggered ADP, the proposed method reduces sampling and computational load by over 60%, significantly improving computational efficiency and learning sparsity while maintaining high disturbance rejection capability and control accuracy. Numerical experiments validate the theoretical stability claims and demonstrate superior performance in terms of robustness, efficiency, and precision.
📝 Abstract
This work proposes a unified control architecture that couples a Reinforcement Learning (RL)-driven controller with a disturbance-rejection Extended State Observer (ESO), complemented by an Event-Triggered Mechanism (ETM) to limit unnecessary computations. The ESO is utilized to estimate the system states and the lumped disturbance in real time, forming the foundation for effective disturbance compensation. To obtain near-optimal behavior without an accurate system description, a value-iteration-based Adaptive Dynamic Programming (ADP) method is adopted for policy approximation. The inclusion of the ETM ensures that parameter updates of the learning module are executed only when the state deviation surpasses a predefined bound, thereby preventing excessive learning activity and substantially reducing computational load. A Lyapunov-oriented analysis is used to characterize the stability properties of the resulting closed-loop system. Numerical experiments further confirm that the developed approach maintains strong control performance and disturbance tolerance, while achieving a significant reduction in sampling and processing effort compared with standard time-triggered ADP schemes.