🤖 AI Summary
This paper addresses the infinite-horizon optimal closed-loop control problem for nonlinear systems with unknown dynamics, aiming to minimize a given cost function from arbitrary initial states without relying on an explicit system model. We propose a data-driven policy optimization method that integrates the Koopman operator with an actor-critic framework: the Koopman operator enables model-free dynamical representation and differentiable cost gradient estimation, while a parameterized policy is updated via stochastic gradient descent. To our knowledge, this is the first model-free policy gradient method with theoretically guaranteed convergence. Experiments demonstrate stable convergence across multiple nonlinear systems, with control performance significantly surpassing standard model-free reinforcement learning algorithms and closely approaching the optimal benchmark achievable under full model knowledge.
📝 Abstract
This paper presents a data-driven method for finding a closed-loop optimal controller, which minimizes a specified infinite-horizon cost function for systems with unknown dynamics given any arbitrary initial state. Suppose the closed-loop optimal controller can be parameterized by a given class of functions, hereafter referred to as the policy. The proposed method introduces a novel gradient estimation framework, which approximates the gradient of the cost function with respect to the policy parameters via integrating the Koopman operator with the classical concept of actor-critic. This enables the policy parameters to be tuned iteratively using gradient descent to achieve an optimal controller, leveraging the linearity of the Koopman operator. The convergence analysis of the proposed framework is provided. The effectiveness of the method is demonstrated through comparisons with a model-free reinforcement learning approach, and its control performance is further evaluated through simulations against model-based optimal control methods that solve the same optimal control problem utilizing the exact system dynamics.