Functional Acceleration for Policy Mirror Descent

📅 2024-07-23

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the slow convergence and reliance on specific policy parameterizations inherent in Policy Mirror Descent (PMD) for large-scale reinforcement learning. To overcome these limitations, we introduce—*for the first time in the function space*—a momentum acceleration mechanism. Our approach constructs a parameterization-agnostic momentum update rule driven by dual variables, thereby unifying existing parameter-space momentum methods and revealing the intrinsic optimization dynamics over the value polytope. We establish theoretical guarantees showing improved convergence rates. Numerical experiments demonstrate accelerated convergence, robustness to hyperparameter choices, and strong generalization across domains. Furthermore, we characterize precise conditions under which functional acceleration is effective and quantify how approximation errors impact learning performance. This work establishes a new paradigm for nonparametric and large-scale policy optimization, bridging rigorous theoretical foundations with practical efficacy.

Technology Category

Application Category

📝 Abstract

We apply functional acceleration to the Policy Mirror Descent (PMD) general family of algorithms, which cover a wide range of novel and fundamental methods in Reinforcement Learning (RL). Leveraging duality, we propose a momentum-based PMD update. By taking the functional route, our approach is independent of the policy parametrization and applicable to large-scale optimization, covering previous applications of momentum at the level of policy parameters as a special case. We theoretically analyze several properties of this approach and complement with a numerical ablation study, which serves to illustrate the policy optimization dynamics on the value polytope, relative to different algorithmic design choices in this space. We further characterize numerically several features of the problem setting relevant for functional acceleration, and lastly, we investigate the impact of approximation on their learning mechanics.

Problem

Research questions and friction points this paper is trying to address.

Applying functional acceleration to Policy Mirror Descent algorithms

Analyzing momentum-based updates for large-scale policy optimization

Investigating approximation impact on learning mechanics in RL

Innovation

Methods, ideas, or system contributions that make the work stand out.

Functional acceleration for Policy Mirror Descent

Momentum-based PMD update via duality

Policy parametrization-independent large-scale optimization

🔎 Similar Papers

No similar papers found.

Authors to Follow