Learning to replenish: A hybrid deep reinforcement learning for dynamic inventory management in the pharmaceutical supply chains

📅 2026-06-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

176K/year
🤖 AI Summary
This study addresses the challenges of demand uncertainty, variable replenishment lead times, and limited shelf life in pharmaceutical supply chains by formulating dynamic inventory management as a Markov decision process. The authors propose a hybrid deep reinforcement learning algorithm that integrates asynchronous advantage actor-critic (A3C) with distributed proximal policy optimization (DPPO) to dynamically optimize replenishment policies in continuous action spaces. This approach effectively tackles sequential decision-making under stochastic conditions, achieving high patient service levels while substantially reducing inventory costs. Experimental results based on real-world pharmaceutical inventory data demonstrate that the proposed method consistently outperforms benchmark approaches across diverse dynamic scenarios, highlighting its practicality and effectiveness.
📝 Abstract
Pharmaceutical supply chains (PSCs) struggle with inventory management (IM) due to unpredictable demand patterns and variable lead times associated with restocking. This complexity is further compounded by the finite shelf lives of pharmaceutical products, which necessitate a delicate balance between adequate stock and minimal waste. These intertwined factors create a complex optimization problem that requires sophisticated inventory strategies to ensure both product availability and PSC efficiency. This study aims to develop an optimal inventory replenishment policy for pharmaceutical products that can handle the stochasticity arising from uncertain demand and variable PSC conditions. The objective is to maximize the profitability of the PSC while maintaining a high patient service level. We formulate the problem as a Markov decision process and propose a deep reinforcement learning (DRL) approach, specifically, a hybrid asynchronous advantage actor critic distributed proximal policy optimization (A3C DPPO)algorithm. The A3C DPPO algorithm is tailored to handle the continuous action space inherent in IM. The numerical results demonstrate that the proposed algorithm adaptively updates the inventory replenishment strategy under dynamic scenarios, resulting in lower inventory costs compared to various benchmarks. We also conduct numerical validation using real-world pharmaceutical inventory data to confirm the practical feasibility of the proposed algorithm.
Problem

Research questions and friction points this paper is trying to address.

inventory management
pharmaceutical supply chains
stochastic demand
perishable products
replenishment policy
Innovation

Methods, ideas, or system contributions that make the work stand out.

deep reinforcement learning
inventory management
pharmaceutical supply chain
A3C DPPO
Markov decision process