Model-Free Output Feedback Stabilization via Policy Gradient Methods

📅 2026-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a model-free output feedback stabilization algorithm for discrete-time linear systems with unknown dynamics and access only to output measurements rather than full state observations. The approach introduces, for the first time, zeroth-order policy gradient updates into the output feedback control setting, enabling direct optimization of the controller policy using system trajectories and thereby circumventing the reliance on full-state feedback inherent in conventional policy gradient methods. Theoretical analysis establishes that the algorithm converges to a stationary point corresponding to a stabilizing policy and provides an explicit bound on its sample complexity. Numerical experiments further corroborate the efficacy of the proposed method.

Technology Category

Application Category

📝 Abstract
Stabilizing a dynamical system is a fundamental problem that serves as a cornerstone for many complex tasks in the field of control systems. The problem becomes challenging when the system model is unknown. Among the Reinforcement Learning (RL) algorithms that have been successfully applied to solve problems pertaining to unknown linear dynamical systems, the policy gradient (PG) method stands out due to its ease of implementation and can solve the problem in a model-free manner. However, most of the existing works on PG methods for unknown linear dynamical systems assume full-state feedback. In this paper, we take a step towards model-free learning for partially observable linear dynamical systems with output feedback and focus on the fundamental stabilization problem of the system. We propose an algorithmic framework that stretches the boundary of PG methods to the problem without global convergence guarantees. We show that by leveraging zeroth-order PG update based on system trajectories and its convergence to stationary points, the proposed algorithms return a stabilizing output feedback policy for discrete-time linear dynamical systems. We also explicitly characterize the sample complexity of our algorithm and verify the effectiveness of the algorithm using numerical examples.
Problem

Research questions and friction points this paper is trying to address.

output feedback
model-free stabilization
partially observable linear systems
policy gradient
reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

model-free control
output feedback
policy gradient
zeroth-order optimization
sample complexity
🔎 Similar Papers
No similar papers found.
A
Ankang Zhang
School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
M
Ming Chi
School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
X
Xiaoling Wang
College of Automation, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
Lintao Ye
Lintao Ye
Huazhong University of Science and Technology
OptimizationControl SystemsReinforcement LearningSubmodularity