π€ AI Summary
This work addresses the high computational cost of high-fidelity simulation models, which hinders efficient training and retraining of reinforcement learning agents in dynamic environments. To overcome this limitation, the paper proposes a learnable surrogate modeling framework tailored for dynamic settings, which approximates the inputβoutput mapping of high-fidelity simulations to substantially reduce training overhead when system dynamics, parameters, or reward structures change. By integrating discrete-event simulation, reinforcement learning, and data-driven surrogate modeling techniques, the framework enables rapid adaptation of policies to environmental shifts. Empirical evaluation in stochastic service systems demonstrates significant acceleration in both initial training and retraining processes, thereby enhancing the adaptability of reinforcement learning policies to evolving conditions.
π Abstract
High-fidelity simulation models are widely used to analyze complex stochastic systems, but their high computational cost motivates the development of cheaper surrogate models that approximate the simulation model's input-output relationship. In parallel, reinforcement learning (RL) has emerged as a powerful framework for making online decisions in stochastic environments, with increasing attention being given to the use of simulation models as training environments for RL models. We investigate a class of surrogate models suitable for accelerating RL training in settings where the reward structure, model parameters, or system dynamics change over time and explore their interactions with simulation models and RL models. Through numerical experiments on a stochastic service system modeled via discrete-event simulation, we demonstrate that leveraging surrogate models can substantially accelerate RL training and re-training.