Reinforcement Learning for AMR Charging Decisions: The Impact of Reward and Action Space Design

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

This work addresses the AMR charging decision optimization problem in large-scale block-stacking warehouses. We propose a reinforcement learning (RL)-based approach to learn efficient, adaptive charging policies. First, we extend the SLAPStack simulation framework to explicitly model AMR charging behavior—a novel capability not previously supported. Second, we introduce a “domain-guided + flexible exploration” RL design paradigm, integrating prior domain knowledge with adaptive exploration strategies. Third, we develop an adaptive heuristic baseline and conduct reproducible ablation studies. Using the Proximal Policy Optimization (PPO) algorithm, we systematically evaluate multiple reward formulations and action-space configurations. Results demonstrate that our RL policy significantly reduces average service time. Moreover, we uncover a fundamental trade-off between open-ended exploration and structured domain guidance—impacting convergence speed, policy stability, and cross-scenario generalization. Our framework delivers an interpretable, deployable RL paradigm for energy management in warehouse AMR systems.

Technology Category

Application Category

📝 Abstract

We propose a novel reinforcement learning (RL) design to optimize the charging strategy for autonomous mobile robots in large-scale block stacking warehouses. RL design involves a wide array of choices that can mostly only be evaluated through lengthy experimentation. Our study focuses on how different reward and action space configurations, ranging from flexible setups to more guided, domain-informed design configurations, affect the agent performance. Using heuristic charging strategies as a baseline, we demonstrate the superiority of flexible, RL-based approaches in terms of service times. Furthermore, our findings highlight a trade-off: While more open-ended designs are able to discover well-performing strategies on their own, they may require longer convergence times and are less stable, whereas guided configurations lead to a more stable learning process but display a more limited generalization potential. Our contributions are threefold. First, we extend SLAPStack, an open-source, RL-compatible simulation-framework to accommodate charging strategies. Second, we introduce a novel RL design for tackling the charging strategy problem. Finally, we introduce several novel adaptive baseline heuristics and reproducibly evaluate the design using a Proximal Policy Optimization agent and varying different design configurations, with a focus on reward.

Problem

Research questions and friction points this paper is trying to address.

Optimizing charging strategies for autonomous mobile robots in warehouses

Evaluating impact of reward and action space designs on RL performance

Balancing flexible vs guided RL designs for stability and generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel RL design for AMR charging optimization

Extended SLAPStack framework for charging strategies

Adaptive heuristics with Proximal Policy Optimization

🔎 Similar Papers

Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications