Functional Acceleration for Policy Mirror Descent

πŸ“… 2024-07-23
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the slow convergence and reliance on specific policy parameterizations inherent in Policy Mirror Descent (PMD) for large-scale reinforcement learning. To overcome these limitations, we introduceβ€”*for the first time in the function space*β€”a momentum acceleration mechanism. Our approach constructs a parameterization-agnostic momentum update rule driven by dual variables, thereby unifying existing parameter-space momentum methods and revealing the intrinsic optimization dynamics over the value polytope. We establish theoretical guarantees showing improved convergence rates. Numerical experiments demonstrate accelerated convergence, robustness to hyperparameter choices, and strong generalization across domains. Furthermore, we characterize precise conditions under which functional acceleration is effective and quantify how approximation errors impact learning performance. This work establishes a new paradigm for nonparametric and large-scale policy optimization, bridging rigorous theoretical foundations with practical efficacy.

Technology Category

Application Category

πŸ“ Abstract
We apply functional acceleration to the Policy Mirror Descent (PMD) general family of algorithms, which cover a wide range of novel and fundamental methods in Reinforcement Learning (RL). Leveraging duality, we propose a momentum-based PMD update. By taking the functional route, our approach is independent of the policy parametrization and applicable to large-scale optimization, covering previous applications of momentum at the level of policy parameters as a special case. We theoretically analyze several properties of this approach and complement with a numerical ablation study, which serves to illustrate the policy optimization dynamics on the value polytope, relative to different algorithmic design choices in this space. We further characterize numerically several features of the problem setting relevant for functional acceleration, and lastly, we investigate the impact of approximation on their learning mechanics.
Problem

Research questions and friction points this paper is trying to address.

Applying functional acceleration to Policy Mirror Descent algorithms
Analyzing momentum-based updates for large-scale policy optimization
Investigating approximation impact on learning mechanics in RL
Innovation

Methods, ideas, or system contributions that make the work stand out.

Functional acceleration for Policy Mirror Descent
Momentum-based PMD update via duality
Policy parametrization-independent large-scale optimization
πŸ”Ž Similar Papers
No similar papers found.
V
Veronica Chelu
McGill University, Mila Quebec AI Institute, Google DeepMind, CIFAR AI Chair
D
D. Precup
McGill University, Mila Quebec AI Institute, Google DeepMind, CIFAR AI Chair