PPO in the Fisher-Rao geometry

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Although Proximal Policy Optimization (PPO) is widely adopted in practice, it lacks theoretical guarantees for monotonic policy improvement and convergence. Method: This paper proposes FR-PPO, a geometric reformulation of policy updates grounded in the Fisher–Rao Riemannian manifold geometry. Under this intrinsic metric, we derive a tight surrogate objective function and rigorously prove that it satisfies the monotonic improvement property. Leveraging differential geometry, policy gradient theory, and trust-region principles—with KL-divergence constraints ensuring update stability—we establish a dimension-independent sublinear convergence rate for FR-PPO in tabular settings. Contributions/Results: FR-PPO retains PPO’s empirical effectiveness while significantly enhancing training stability and interpretability of the convergence process. It provides the first theoretically grounded framework for PPO that unifies geometric insight with provable convergence guarantees.

Technology Category

Application Category

📝 Abstract

Proximal Policy Optimization (PPO) has become a widely adopted algorithm for reinforcement learning, offering a practical policy gradient method with strong empirical performance. Despite its popularity, PPO lacks formal theoretical guarantees for policy improvement and convergence. PPO is motivated by Trust Region Policy Optimization (TRPO) that utilizes a surrogate loss with a KL divergence penalty, which arises from linearizing the value function within a flat geometric space. In this paper, we derive a tighter surrogate in the Fisher-Rao (FR) geometry, yielding a novel variant, Fisher-Rao PPO (FR-PPO). Our proposed scheme provides strong theoretical guarantees, including monotonic policy improvement. Furthermore, in the tabular setting, we demonstrate that FR-PPO achieves sub-linear convergence without any dependence on the dimensionality of the action or state spaces, marking a significant step toward establishing formal convergence results for PPO-based algorithms.

Problem

Research questions and friction points this paper is trying to address.

Lack of theoretical guarantees in PPO

Need for tighter surrogate in Fisher-Rao geometry

Achieving sub-linear convergence in tabular settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes Fisher-Rao geometry for tighter surrogate

Introduces FR-PPO with monotonic improvement guarantees

Achieves sub-linear convergence in tabular settings

🔎 Similar Papers

No similar papers found.