Return Capping: Sample-Efficient CVaR Policy Gradient Optimisation

📅 2025-04-29

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

Existing Conditional Value-at-Risk policy gradient (CVaR-PG) methods achieve risk-sensitive optimization by discarding low-return trajectories, resulting in severe sample inefficiency. This work proposes **Return Capping**, a novel mechanism that imposes a hard upper bound on trajectory returns during training and incorporates a reweighting estimator—provably equivalent to the original CVaR objective without information loss. Unlike prior approaches, Return Capping shifts CVaR-PG’s sample efficiency paradigm from *discarding* to *reusing* trajectories, substantially improving data utilization. Empirically, the method achieves faster convergence and enhanced robustness across multiple benchmark environments. It improves sample efficiency by several-fold over state-of-the-art baselines while maintaining theoretical equivalence to the CVaR optimization goal. This work thus provides the first provably equivalent, high-efficiency policy gradient implementation for risk-sensitive reinforcement learning.

Technology Category

Application Category

📝 Abstract

When optimising for conditional value at risk (CVaR) using policy gradients (PG), current methods rely on discarding a large proportion of trajectories, resulting in poor sample efficiency. We propose a reformulation of the CVaR optimisation problem by capping the total return of trajectories used in training, rather than simply discarding them, and show that this is equivalent to the original problem if the cap is set appropriately. We show, with empirical results in an number of environments, that this reformulation of the problem results in consistently improved performance compared to baselines.

Problem

Research questions and friction points this paper is trying to address.

Improving sample efficiency in CVaR policy gradient optimisation

Reformulating CVaR optimisation by capping trajectory returns

Demonstrating consistent performance gains over baseline methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Capping trajectory returns for CVaR optimization

Reformulating CVaR problem to improve efficiency

Empirical performance gains over baseline methods

🔎 Similar Papers

No similar papers found.