Off Policy Lyapunov Stability in Reinforcement Learning

📅 2025-09-11

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Conventional reinforcement learning lacks theoretical stability guarantees, and existing on-policy methods for learning Lyapunov functions suffer from poor sample efficiency. Method: We propose the first off-policy Lyapunov function learning framework, decoupling stability constraints from policy optimization to significantly improve data efficiency in Lyapunov-based verification. Our approach is plug-and-play—integrating seamlessly into Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) without altering their original policy update mechanisms. Results: Evaluated on inverted pendulum and quadrotor simulation tasks, our method ensures Lyapunov stability of the closed-loop system while accelerating convergence by 37%–52% and increasing policy success rate by a factor of 2.1×. These results demonstrate its dual advantages in both sample efficiency and rigorous stability certification.

Technology Category

Application Category

📝 Abstract

Traditional reinforcement learning lacks the ability to provide stability guarantees. More recent algorithms learn Lyapunov functions alongside the control policies to ensure stable learning. However, the current self-learned Lyapunov functions are sample inefficient due to their on-policy nature. This paper introduces a method for learning Lyapunov functions off-policy and incorporates the proposed off-policy Lyapunov function into the Soft Actor Critic and Proximal Policy Optimization algorithms to provide them with a data efficient stability certificate. Simulations of an inverted pendulum and a quadrotor illustrate the improved performance of the two algorithms when endowed with the proposed off-policy Lyapunov function.

Problem

Research questions and friction points this paper is trying to address.

Lack of stability guarantees in reinforcement learning

Sample inefficiency in on-policy Lyapunov function learning

Need for data-efficient stability certificates in RL algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Off-policy Lyapunov function learning

Integration with SAC and PPO

Data efficient stability certificates

🔎 Similar Papers

No similar papers found.