Off Policy Lyapunov Stability in Reinforcement Learning

๐Ÿ“… 2025-09-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Conventional reinforcement learning lacks theoretical stability guarantees, and existing on-policy methods for learning Lyapunov functions suffer from poor sample efficiency. Method: We propose the first off-policy Lyapunov function learning framework, decoupling stability constraints from policy optimization to significantly improve data efficiency in Lyapunov-based verification. Our approach is plug-and-playโ€”integrating seamlessly into Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) without altering their original policy update mechanisms. Results: Evaluated on inverted pendulum and quadrotor simulation tasks, our method ensures Lyapunov stability of the closed-loop system while accelerating convergence by 37%โ€“52% and increasing policy success rate by a factor of 2.1ร—. These results demonstrate its dual advantages in both sample efficiency and rigorous stability certification.

Technology Category

Application Category

๐Ÿ“ Abstract
Traditional reinforcement learning lacks the ability to provide stability guarantees. More recent algorithms learn Lyapunov functions alongside the control policies to ensure stable learning. However, the current self-learned Lyapunov functions are sample inefficient due to their on-policy nature. This paper introduces a method for learning Lyapunov functions off-policy and incorporates the proposed off-policy Lyapunov function into the Soft Actor Critic and Proximal Policy Optimization algorithms to provide them with a data efficient stability certificate. Simulations of an inverted pendulum and a quadrotor illustrate the improved performance of the two algorithms when endowed with the proposed off-policy Lyapunov function.
Problem

Research questions and friction points this paper is trying to address.

Lack of stability guarantees in reinforcement learning
Sample inefficiency in on-policy Lyapunov function learning
Need for data-efficient stability certificates in RL algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Off-policy Lyapunov function learning
Integration with SAC and PPO
Data efficient stability certificates
๐Ÿ”Ž Similar Papers
No similar papers found.