Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling

๐Ÿ“… 2024-09-16
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses the NP-hard job-shop scheduling problem (JSSP) by pioneering the application of offline reinforcement learning (RL) to overcome the sample inefficiency and cold-start limitations inherent in online RL. The proposed method introduces three key innovations: (1) a modified Conservative Q-Learning (CQL) algorithm tailored for maskable action spaces; (2) an entropy-based reward mechanism within a discrete Soft Actor-Critic (SAC) framework to enhance policy exploration; and (3) expert dataset augmentation via controlled noise injection and reward normalization for robust offline training. An offline Q-learning framework is constructed by integrating masked actions and entropy regularization into both quantile-based discrete Q-networks (mQRDQN) and discrete maximum-entropy SAC (mSAC). Experiments demonstrate that the approach significantly outperforms online RL on both generated and standard benchmark instances. Moreover, noisy expert data yields performance comparable toโ€”or even exceedingโ€”that of pristine expert data, empirically validating the utility of counterfactual information in offline policy learning.

Technology Category

Application Category

๐Ÿ“ Abstract
The Job Shop Scheduling Problem (JSSP) is a complex combinatorial optimization problem. While online Reinforcement Learning (RL) has shown promise by quickly finding acceptable solutions for JSSP, it faces key limitations: it requires extensive training interactions from scratch leading to sample inefficiency, cannot leverage existing high-quality solutions, and often yields suboptimal results compared to traditional methods like Constraint Programming (CP). We introduce Offline Reinforcement Learning for Learning to Dispatch (Offline-LD), which addresses these limitations by learning from previously generated solutions. Our approach is motivated by scenarios where historical scheduling data and expert solutions are available, although our current evaluation focuses on benchmark problems. Offline-LD adapts two CQL-based Q-learning methods (mQRDQN and discrete mSAC) for maskable action spaces, introduces a novel entropy bonus modification for discrete SAC, and exploits reward normalization through preprocessing. Our experiments demonstrate that Offline-LD outperforms online RL on both generated and benchmark instances. Notably, by introducing noise into the expert dataset, we achieve similar or better results than those obtained from the expert dataset, suggesting that a more diverse training set is preferable because it contains counterfactual information.
Problem

Research questions and friction points this paper is trying to address.

Offline Reinforcement Learning
Job Shop Scheduling Problem (JSSP)
Efficiency and Quality Improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Offline Reinforcement Learning
Job Shop Scheduling Problem (JSSP)
mQRDQN and Discrete mSAC
๐Ÿ”Ž Similar Papers
No similar papers found.