Multi-Fidelity Hybrid Reinforcement Learning via Information Gain Maximization

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

High-fidelity simulators often incur prohibitive interaction costs or are infeasible for direct use in reinforcement learning (RL), necessitating efficient multi-fidelity optimization under fixed budget constraints. Method: We propose MF-HRL-IGM, a multi-fidelity hierarchical RL algorithm that dynamically selects simulator fidelity via guided Information Gain Maximization (IGM), enabling adaptive fidelity sampling within an online/offline hybrid training framework. Contribution/Results: We provide theoretical guarantees of no-regret learning for MF-HRL-IGM. Empirical evaluation demonstrates that, under identical budget constraints, our method achieves superior policy performance and significantly higher data efficiency compared to state-of-the-art baselines. It substantially improves both learning efficiency and resource utilization in high-cost simulation environments.

Technology Category

Application Category

📝 Abstract

Optimizing a reinforcement learning (RL) policy typically requires extensive interactions with a high-fidelity simulator of the environment, which are often costly or impractical. Offline RL addresses this problem by allowing training from pre-collected data, but its effectiveness is strongly constrained by the size and quality of the dataset. Hybrid offline-online RL leverages both offline data and interactions with a single simulator of the environment. In many real-world scenarios, however, multiple simulators with varying levels of fidelity and computational cost are available. In this work, we study multi-fidelity hybrid RL for policy optimization under a fixed cost budget. We introduce multi-fidelity hybrid RL via information gain maximization (MF-HRL-IGM), a hybrid offline-online RL algorithm that implements fidelity selection based on information gain maximization through a bootstrapping approach. Theoretical analysis establishes the no-regret property of MF-HRL-IGM, while empirical evaluations demonstrate its superior performance compared to existing benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Optimizing RL policy with costly high-fidelity simulator interactions

Addressing limitations of offline RL through hybrid data utilization

Selecting optimal simulator fidelity under fixed computational budget

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-fidelity hybrid reinforcement learning algorithm

Information gain maximization for fidelity selection

Bootstrapping approach for no-regret performance

🔎 Similar Papers

Hybrid Training for Enhanced Multi-task Generalization in Multi-agent Reinforcement Learning