Deep Policy Iteration with Integer Programming for Inventory Management

📅 2021-12-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large-scale warehouse inventory management faces challenges including long-horizon planning, high-dimensional combinatorial action spaces, and strong state-dependent constraints—such as capacity limits and demand uncertainty. Method: This paper proposes Programmable Actor Reinforcement Learning (PARL), a novel framework that integrates deep policy iteration with Mixed-Integer Linear Programming (MILP) and Sample Average Approximation (SAA) to compute globally optimal actions at each step under combinatorial constraints. Contribution/Results: Experiments across diverse, complex supply chain scenarios demonstrate that PARL outperforms state-of-the-art reinforcement learning algorithms and classical heuristics by an average of 14.7%. It significantly reduces inventory holding costs and stockout losses, with particularly pronounced gains under resource-constrained conditions. To our knowledge, this is the first work to unify deep policy optimization with exact combinatorial optimization and stochastic approximation for sequential inventory control.
📝 Abstract
We present a Reinforcement Learning (RL) based framework for optimizing long-term discounted reward problems with large combinatorial action space and state dependent constraints. These characteristics are common to many operations management problems, e.g., network inventory replenishment, where managers have to deal with uncertain demand, lost sales, and capacity constraints that results in more complex feasible action spaces. Our proposed Programmable Actor Reinforcement Learning (PARL) uses a deep-policy iteration method that leverages neural networks (NNs) to approximate the value function and combines it with mathematical programming (MP) and sample average approximation (SAA) to solve the per-step-action optimally while accounting for combinatorial action spaces and state-dependent constraint sets. We show how the proposed methodology can be applied to complex inventory replenishment problems where analytical solutions are intractable. We also benchmark the proposed algorithm against state-of-the-art RL algorithms and commonly used replenishment heuristics and find it considerably outperforms existing methods by as much as 14.7% on average in various complex supply chain settings. We find that this improvement of PARL over benchmark algorithms can be directly attributed to better inventory cost management, especially in inventory constrained settings. Furthermore, in the simpler setting where optimal replenishment policy is tractable or known near optimal heuristics exist, we find that the RL approaches can learn near optimal policies. Finally, to make RL algorithms more accessible for inventory management researchers, we also discuss the development of a modular Python library that can be used to test the performance of RL algorithms with various supply chain structures and spur future research in developing practical and near-optimal algorithms for inventory management problems.
Problem

Research questions and friction points this paper is trying to address.

Inventory Management
Optimization Strategies
Sales Forecasting
Innovation

Methods, ideas, or system contributions that make the work stand out.

PARL
Reinforcement Learning
Inventory Management
🔎 Similar Papers
No similar papers found.