gym-invmgmt: An Open Benchmarking Framework for Inventory Management Methods

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the lack of a unified, reproducible benchmark for evaluating inventory management policies due to heterogeneous environmental assumptions. To this end, we introduce gym-invmgmt, an open-source framework built on Gymnasium that standardizes 22 core scenarios and their multi-agent extensions by harmonizing state transitions, action constraints, reward functions, and key performance indicators. This enables, for the first time, auditable comparisons among optimization-based, heuristic, and learning-based policies under a consistent evaluation protocol. Through systematic experiments, we assess diverse approaches—including stochastic programming, PPO-Transformer, Residual RL, graph neural networks (GNNs), imitation learning, and constrained large language models—revealing that policy performance jointly depends on information access, demand dynamics, network topology, and representation design. Stochastic programming achieves optimal performance at high computational cost, PPO-Transformer offers efficient inference with high policy quality, Residual RL demonstrates robustness, and GNNs excel in divergent topologies but exhibit fragility in serial structures.

📝 Abstract

Inventory-policy comparisons are often difficult to interpret because performance depends on the evaluation contract as much as on the policy itself. Differences in topology, demand regime, information access, feasibility constraints, shortage treatment, and Key Performance Indicator (KPI) definitions can change method rankings. We present gym-invmgmt, a Gymnasium-compatible extension of the OR-Gym inventory-management lineage for auditable cross-paradigm evaluation. The benchmark evaluates optimization, heuristic, and learned controllers under a shared CoreEnv transition, reward, action-bound, and KPI contract, while varying stress conditions through a 22-scenario core grid plus four supplemental MARL-mode rows. Within these released scenarios, informed stochastic programming provides the strongest non-oracle reference, reflecting the value of scenario hedging under forecast access, but at substantially higher online computational cost. Among learned controllers, the Proximal Policy Optimization Transformer variant (PPO-Transformer) achieves the strongest learned-policy quality at fast inference, while Residual Reinforcement Learning (Residual RL) provides competitive hybrid performance. The graph neural network variant (PPO-GNN) is highly competitive on the default divergent topology but less robust on the serial topology. Imitation learning performs well in stationary regimes but degrades under demand shift, and the bounded Large Language Model (LLM) policy-parameter baseline is best interpreted as a diagnostic controller rather than an autonomous inventory optimizer. Overall, the benchmark identifies scenario-conditioned leaders while showing that performance depends jointly on information access, demand shift, topology, and policy representation.

Problem

Research questions and friction points this paper is trying to address.

inventory management

benchmarking

policy evaluation

performance comparison

evaluation contract

Innovation

Methods, ideas, or system contributions that make the work stand out.

inventory management

benchmarking framework

reinforcement learning