Hybrid Cross-domain Robust Reinforcement Learning

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenges of scarce real-world offline data and simulator dynamics mismatch—both undermining policy robustness in offline reinforcement learning—this paper proposes HYDRO, a Hybrid cross-Domain Robust RL framework. HYDRO uniquely integrates limited real offline data with online simulator data under uncertainty-aware modeling, establishing a worst-case dynamics alignment mechanism. It further introduces an uncertainty-aware filtering and prioritized sampling strategy to adaptively select and efficiently utilize high-confidence simulator samples. Evaluated on multi-task benchmarks, HYDRO significantly improves policy generalization robustness and sample efficiency over existing offline robust RL methods. It establishes a novel paradigm for reliable policy learning in low-data, high-dynamics-mismatch regimes.

Technology Category

Application Category

📝 Abstract
Robust reinforcement learning (RL) aims to learn policies that remain effective despite uncertainties in its environment, which frequently arise in real-world applications due to variations in environment dynamics. The robust RL methods learn a robust policy by maximizing value under the worst-case models within a predefined uncertainty set. Offline robust RL algorithms are particularly promising in scenarios where only a fixed dataset is available and new data cannot be collected. However, these approaches often require extensive offline data, and gathering such datasets for specific tasks in specific environments can be both costly and time-consuming. Using an imperfect simulator offers a faster, cheaper, and safer way to collect data for training, but it can suffer from dynamics mismatch. In this paper, we introduce HYDRO, the first Hybrid Cross-Domain Robust RL framework designed to address these challenges. HYDRO utilizes an online simulator to complement the limited amount of offline datasets in the non-trivial context of robust RL. By measuring and minimizing performance gaps between the simulator and the worst-case models in the uncertainty set, HYDRO employs novel uncertainty filtering and prioritized sampling to select the most relevant and reliable simulator samples. Our extensive experiments demonstrate HYDRO's superior performance over existing methods across various tasks, underscoring its potential to improve sample efficiency in offline robust RL.
Problem

Research questions and friction points this paper is trying to address.

Addresses uncertainties in environment dynamics for robust RL
Combines offline data with online simulator to reduce costs
Minimizes performance gaps between simulator and worst-case models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Cross-Domain Robust RL framework
Uncertainty filtering and prioritized sampling
Online simulator complements offline datasets
🔎 Similar Papers
No similar papers found.
L
Linh Le Pham Van
Deakin Applied Artificial Intelligence Initiative, Deakin University, Australia
M
Minh Hoang Nguyen
Deakin Applied Artificial Intelligence Initiative, Deakin University, Australia
H
Hung Le
Deakin Applied Artificial Intelligence Initiative, Deakin University, Australia
Hung The Tran
Hung The Tran
AI Center, VNPT Media
Machine LearningOptimizationReinforcement LearningLarge Language Models
S
Sunil Gupta
Deakin Applied Artificial Intelligence Initiative, Deakin University, Australia