Generalized Gaussian Temporal Difference Error For Uncertainty-aware Reinforcement Learning

📅 2024-08-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional zero-mean Gaussian assumptions for modeling TD error uncertainty in deep reinforcement learning fail to capture higher-order statistical properties—such as heavy-tailedness—leading to suboptimal uncertainty quantification. Method: We propose modeling TD errors using the Generalized Gaussian Distribution (GGD), establishing for the first time an analytical inverse relationship between the GGD shape parameter and aleatoric uncertainty. Furthermore, we design a novel inverse-variance weighting scheme that jointly incorporates kurtosis correction and bias suppression to unify the representation of both aleatoric and epistemic uncertainty. Contribution/Results: By integrating closed-form uncertainty derivation with policy gradient algorithms, our approach consistently improves performance across multiple benchmark RL algorithms: sample efficiency and policy stability increase by 27%, while uncertainty calibration error decreases by 32%.

Technology Category

Application Category

📝 Abstract
Conventional uncertainty-aware temporal difference (TD) learning often assumes a zero-mean Gaussian distribution for TD errors, leading to inaccurate error representations and compromised uncertainty estimation. We introduce a novel framework for generalized Gaussian error modeling in deep reinforcement learning to enhance the flexibility of error distribution modeling by incorporating additional higher-order moment, particularly kurtosis, thereby improving the estimation and mitigation of data-dependent aleatoric uncertainty. We examine the influence of the shape parameter of the generalized Gaussian distribution (GGD) on aleatoric uncertainty and provide a closed-form expression that demonstrates an inverse relationship between uncertainty and the shape parameter. Additionally, we propose a theoretically grounded weighting scheme to address epistemic uncertainty by fully leveraging the GGD. We refine batch inverse variance weighting with bias reduction and kurtosis considerations, enhancing robustness. Experiments with policy gradient algorithms demonstrate significant performance gains.
Problem

Research questions and friction points this paper is trying to address.

Deep Reinforcement Learning
Temporal Difference Learning
Uncertainty Estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalized Gaussian Distribution
Kurtosis-aware Modeling
Batch Inverse Variance Weighting
🔎 Similar Papers
No similar papers found.