🤖 AI Summary
This work addresses the AMR charging decision optimization problem in large-scale block-stacking warehouses. We propose a reinforcement learning (RL)-based approach to learn efficient, adaptive charging policies. First, we extend the SLAPStack simulation framework to explicitly model AMR charging behavior—a novel capability not previously supported. Second, we introduce a “domain-guided + flexible exploration” RL design paradigm, integrating prior domain knowledge with adaptive exploration strategies. Third, we develop an adaptive heuristic baseline and conduct reproducible ablation studies. Using the Proximal Policy Optimization (PPO) algorithm, we systematically evaluate multiple reward formulations and action-space configurations. Results demonstrate that our RL policy significantly reduces average service time. Moreover, we uncover a fundamental trade-off between open-ended exploration and structured domain guidance—impacting convergence speed, policy stability, and cross-scenario generalization. Our framework delivers an interpretable, deployable RL paradigm for energy management in warehouse AMR systems.
📝 Abstract
We propose a novel reinforcement learning (RL) design to optimize the charging strategy for autonomous mobile robots in large-scale block stacking warehouses. RL design involves a wide array of choices that can mostly only be evaluated through lengthy experimentation. Our study focuses on how different reward and action space configurations, ranging from flexible setups to more guided, domain-informed design configurations, affect the agent performance. Using heuristic charging strategies as a baseline, we demonstrate the superiority of flexible, RL-based approaches in terms of service times. Furthermore, our findings highlight a trade-off: While more open-ended designs are able to discover well-performing strategies on their own, they may require longer convergence times and are less stable, whereas guided configurations lead to a more stable learning process but display a more limited generalization potential. Our contributions are threefold. First, we extend SLAPStack, an open-source, RL-compatible simulation-framework to accommodate charging strategies. Second, we introduce a novel RL design for tackling the charging strategy problem. Finally, we introduce several novel adaptive baseline heuristics and reproducibly evaluate the design using a Proximal Policy Optimization agent and varying different design configurations, with a focus on reward.