Adversarial Diffusion for Robust Reinforcement Learning

πŸ“… 2025-09-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Robustness of reinforcement learning (RL) under model misspecification and environmental uncertainty remains a critical challenge. To address this, we propose Diff-Robustβ€”a novel robust RL framework grounded in conditional diffusion models. Diff-Robust is the first to integrate diffusion modeling with Conditional Value-at-Risk (CVaR) optimization, enabling adversarial conditional sampling to generate worst-case trajectories and explicitly optimizing the tail risk of cumulative returns. The method unifies model-based RL, trajectory-level uncertainty modeling, and adversarial training, while injecting risk-sensitive priors directly into the diffusion process to achieve end-to-end robust policy learning against uncertain environment dynamics. Evaluated on multiple standard benchmarks, Diff-Robust significantly outperforms existing robust RL approaches, achieving state-of-the-art performance in both out-of-distribution generalization and worst-case return. This work establishes a new paradigm for risk-aware, robust sequential decision-making.

Technology Category

Application Category

πŸ“ Abstract
Robustness to modeling errors and uncertainties remains a central challenge in reinforcement learning (RL). In this work, we address this challenge by leveraging diffusion models to train robust RL policies. Diffusion models have recently gained popularity in model-based RL due to their ability to generate full trajectories "all at once", mitigating the compounding errors typical of step-by-step transition models. Moreover, they can be conditioned to sample from specific distributions, making them highly flexible. We leverage conditional sampling to learn policies that are robust to uncertainty in environment dynamics. Building on the established connection between Conditional Value at Risk (CVaR) optimization and robust RL, we introduce Adversarial Diffusion for Robust Reinforcement Learning (AD-RRL). AD-RRL guides the diffusion process to generate worst-case trajectories during training, effectively optimizing the CVaR of the cumulative return. Empirical results across standard benchmarks show that AD-RRL achieves superior robustness and performance compared to existing robust RL methods.
Problem

Research questions and friction points this paper is trying to address.

Addressing robustness to modeling errors in reinforcement learning
Leveraging diffusion models to train robust RL policies
Optimizing CVaR of cumulative return through adversarial trajectory generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages diffusion models for robust RL training
Guides diffusion to generate worst-case trajectories
Optimizes CVaR of cumulative return for robustness
πŸ”Ž Similar Papers
No similar papers found.