Fully Adaptive Regret-Guaranteed Algorithm for Control of Linear Quadratic Systems

📅 2024-06-11

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 1

career value

213K/year

🤖 AI Summary

This paper addresses the adaptive linear quadratic regulation (LQR) control problem for unknown linear systems. We propose the first fully adaptive algorithm that requires no prior knowledge of system parameters, no warm-up phase, and no norm-based assumptions on the solution to the discrete algebraic Riccati equation (DARE). The algorithm dynamically balances exploration and exploitation while adaptively adjusting policy update frequency. Our method is built upon a semidefinite programming (SDP) framework, integrating self-tuning regularization and adaptive input perturbation. We establish, for the first time, high-probability system-theoretic bounds on state trajectories and achieve the optimal $mathcal{O}(sqrt{T})$ regret bound, with explicit dependence on system dimension and the DARE solution. The algorithm is computationally efficient, significantly reducing both initialization complexity and empirical regret.

Technology Category

Application Category

📝 Abstract

The first algorithm for the Linear Quadratic (LQ) control problem with an unknown system model, featuring a regret of $mathcal{O}(sqrt{T})$, was introduced by Abbasi-Yadkori and Szepesv'ari (2011). Recognizing the computational complexity of this algorithm, subsequent efforts (see Cohen et al. (2019), Mania et al. (2019), Faradonbeh et al. (2020a), and Kargin et al.(2022)) have been dedicated to proposing algorithms that are computationally tractable while preserving this order of regret. Although successful, the existing works in the literature lack a fully adaptive exploration-exploitation trade-off adjustment and require a user-defined value, which can lead to overall regret bound growth with some factors. In this work, noticing this gap, we propose the first fully adaptive algorithm that controls the number of policy updates (i.e., tunes the exploration-exploitation trade-off) and optimizes the upper-bound of regret adaptively. Our proposed algorithm builds on the SDP-based approach of Cohen et al. (2019) and relaxes its need for a horizon-dependant warm-up phase by appropriately tuning the regularization parameter and adding an adaptive input perturbation. We further show that through careful exploration-exploitation trade-off adjustment there is no need to commit to the widely-used notion of strong sequential stability, which is restrictive and can introduce complexities in initialization.

Problem

Research questions and friction points this paper is trying to address.

Develops anytime regret-guaranteed algorithm for linear quadratic control.

Addresses stability and optimal regret without prior DARE solution bounds.

Provides explicit high-probability state bounds in system-theoretic terms.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses regularization and confidence ellipsoids for control design

Incorporates input-perturbation for anytime performance guarantee

Eliminates need for a priori bound on DARE solution

🔎 Similar Papers

No similar papers found.