Offline reinforcement learning for job-shop scheduling problems

📅 2024-10-21

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

246K/year

🤖 AI Summary

Combinatorial optimization problems such as the Job-Shop Scheduling Problem (JSP) and Flexible JSP (FJSP) with complex constraints suffer from slow convergence in existing deep reinforcement learning (DRL) methods and weak generalization of behavior cloning approaches, which often neglect explicit optimization objectives. Method: We propose the first offline reinforcement learning framework tailored for scheduling tasks: states are modeled as heterogeneous graphs, actions are encoded as edge attributes, and a joint graph attention mechanism is integrated with constraint-aware modeling. Crucially, our method unifies offline RL reward maximization with expert trajectory imitation, enabling synergistic balance between objective-driven optimization and behavior fidelity during policy learning. Representation and decision-making are realized via a heterogeneous graph neural network. Contribution/Results: Evaluated on standard JSP/FJSP benchmarks, our approach significantly outperforms state-of-the-art methods in solution quality and achieves several-fold improvement in training efficiency.

Technology Category

Application Category

📝 Abstract

Recent advances in deep learning have shown significant potential for solving combinatorial optimization problems in real-time. Unlike traditional methods, deep learning can generate high-quality solutions efficiently, which is crucial for applications like routing and scheduling. However, existing approaches like deep reinforcement learning (RL) and behavioral cloning have notable limitations, with deep RL suffering from slow learning and behavioral cloning relying solely on expert actions, which can lead to generalization issues and neglect of the optimization objective. This paper introduces a novel offline RL method designed for combinatorial optimization problems with complex constraints, where the state is represented as a heterogeneous graph and the action space is variable. Our approach encodes actions in edge attributes and balances expected rewards with the imitation of expert solutions. We demonstrate the effectiveness of this method on job-shop scheduling and flexible job-shop scheduling benchmarks, achieving superior performance compared to state-of-the-art techniques.

Problem

Research questions and friction points this paper is trying to address.

Offline RL for job-shop scheduling with complex constraints

Overcoming slow learning and generalization issues in existing methods

Balancing rewards and expert imitation in combinatorial optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Offline RL for job-shop scheduling problems

Encodes actions in heterogeneous graph edges

Balances rewards and expert solution imitation

🔎 Similar Papers

Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling