🤖 AI Summary
Dynamic job shop scheduling faces strong uncertainty due to stochastic job arrivals; existing offline learning policies exhibit limited generalization and thus rely on online planning—e.g., Monte Carlo Tree Search (MCTS)—yet MCTS decisions under partial observability are highly susceptible to perturbations. This paper proposes a robustness-enhanced MCTS framework that, for the first time, integrates action robustness estimation into both the selection and backpropagation phases of MCTS. By jointly leveraging offline-learned policies and online dynamic planning, the framework steers the system toward high-efficiency production states that are inherently interference-resilient and easily adjustable. Experiments demonstrate that the method significantly outperforms baseline policies—including standard MCTS—across diverse dynamic scenarios, achieving simultaneous improvement in long-horizon scheduling performance and decision robustness, with negligible additional online computational overhead.
📝 Abstract
Dynamic job shop scheduling, a fundamental combinatorial optimisation problem in various industrial sectors, poses substantial challenges for effective scheduling due to frequent disruptions caused by the arrival of new jobs. State-of-the-art methods employ machine learning to learn scheduling policies offline, enabling rapid responses to dynamic events. However, these offline policies are often imperfect, necessitating the use of planning techniques such as Monte Carlo Tree Search (MCTS) to improve performance at online decision time. The unpredictability of new job arrivals complicates online planning, as decisions based on incomplete problem information are vulnerable to disturbances. To address this issue, we propose the Dynamic Robust MCTS (DyRo-MCTS) approach, which integrates action robustness estimation into MCTS. DyRo-MCTS guides the production environment toward states that not only yield good scheduling outcomes but are also easily adaptable to future job arrivals. Extensive experiments show that DyRo-MCTS significantly improves the performance of offline-learned policies with negligible additional online planning time. Moreover, DyRo-MCTS consistently outperforms vanilla MCTS across various scheduling scenarios. Further analysis reveals that its ability to make robust scheduling decisions leads to long-term, sustainable performance gains under disturbances.