Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling

📅 2024-09-16

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper addresses the NP-hard job-shop scheduling problem (JSSP) by pioneering the application of offline reinforcement learning (RL) to overcome the sample inefficiency and cold-start limitations inherent in online RL. The proposed method introduces three key innovations: (1) a modified Conservative Q-Learning (CQL) algorithm tailored for maskable action spaces; (2) an entropy-based reward mechanism within a discrete Soft Actor-Critic (SAC) framework to enhance policy exploration; and (3) expert dataset augmentation via controlled noise injection and reward normalization for robust offline training. An offline Q-learning framework is constructed by integrating masked actions and entropy regularization into both quantile-based discrete Q-networks (mQRDQN) and discrete maximum-entropy SAC (mSAC). Experiments demonstrate that the approach significantly outperforms online RL on both generated and standard benchmark instances. Moreover, noisy expert data yields performance comparable to—or even exceeding—that of pristine expert data, empirically validating the utility of counterfactual information in offline policy learning.

Technology Category

Application Category

📝 Abstract

The Job Shop Scheduling Problem (JSSP) is a complex combinatorial optimization problem. While online Reinforcement Learning (RL) has shown promise by quickly finding acceptable solutions for JSSP, it faces key limitations: it requires extensive training interactions from scratch leading to sample inefficiency, cannot leverage existing high-quality solutions, and often yields suboptimal results compared to traditional methods like Constraint Programming (CP). We introduce Offline Reinforcement Learning for Learning to Dispatch (Offline-LD), which addresses these limitations by learning from previously generated solutions. Our approach is motivated by scenarios where historical scheduling data and expert solutions are available, although our current evaluation focuses on benchmark problems. Offline-LD adapts two CQL-based Q-learning methods (mQRDQN and discrete mSAC) for maskable action spaces, introduces a novel entropy bonus modification for discrete SAC, and exploits reward normalization through preprocessing. Our experiments demonstrate that Offline-LD outperforms online RL on both generated and benchmark instances. Notably, by introducing noise into the expert dataset, we achieve similar or better results than those obtained from the expert dataset, suggesting that a more diverse training set is preferable because it contains counterfactual information.

Problem

Research questions and friction points this paper is trying to address.

Offline Reinforcement Learning

Job Shop Scheduling Problem (JSSP)

Efficiency and Quality Improvement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Offline Reinforcement Learning

Job Shop Scheduling Problem (JSSP)

mQRDQN and Discrete mSAC

🔎 Similar Papers

No similar papers found.

Authors to Follow