Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning

📅 2024-08-06
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address low sample efficiency and unstable convergence in reinforcement learning caused by sparse rewards, this paper proposes an adaptive reward shaping method grounded in historical success rates. The method models state-dependent success probability as a time-varying Beta distribution—explicitly capturing epistemic uncertainty for the first time in this context. It further introduces an uncertainty-driven stochastic annealing strategy that naturally balances exploration and exploitation. For scalable, model-free, nonparametric success-rate estimation in high-dimensional continuous state spaces, the approach integrates kernel density estimation (KDE) with Random Fourier Features. Experiments demonstrate substantial improvements in sample efficiency and convergence stability on extremely sparse-reward tasks, consistently outperforming state-of-the-art reward shaping and intrinsic motivation baselines across diverse benchmarks.

Technology Category

Application Category

📝 Abstract
Reward shaping is a technique in reinforcement learning that addresses the sparse-reward problem by providing more frequent and informative rewards. We introduce a self-adaptive and highly efficient reward shaping mechanism that incorporates success rates derived from historical experiences as shaped rewards. The success rates are sampled from Beta distributions, which dynamically evolve from uncertain to reliable values as data accumulates. Initially, the shaped rewards exhibit more randomness to encourage exploration, while over time, the increasing certainty enhances exploitation, naturally balancing exploration and exploitation. Our approach employs Kernel Density Estimation (KDE) combined with Random Fourier Features (RFF) to derive the Beta distributions, providing a computationally efficient, non-parametric, and learning-free solution for high-dimensional continuous state spaces. Our method is validated on various tasks with extremely sparse rewards, demonstrating notable improvements in sample efficiency and convergence stability over relevant baselines.
Problem

Research questions and friction points this paper is trying to address.

Addresses sparse-reward problem in reinforcement learning
Introduces self-adaptive reward shaping using historical success rates
Balances exploration and exploitation with evolving Beta distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-adaptive reward shaping using historical success rates
Beta distributions evolve from uncertain to reliable values
KDE and RFF for efficient high-dimensional state spaces
🔎 Similar Papers
No similar papers found.
H
Haozhe Ma
National University of Singapore
Z
Zhengding Luo
Nanyang Technological University
T
Thanh Vinh Vo
National University of Singapore
K
Kuankuan Sima
National University of Singapore
Tze-Yun Leong
Tze-Yun Leong
National University of Singapore
Artificial intelligencebiomedical informatics