When does learning pay off? A study on DRL-based dynamic algorithm configuration for carbon-aware scheduling

📅 2026-04-02

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This study investigates whether the substantial training cost of deep reinforcement learning (DRL) in carbon-aware flow shop scheduling can be justified by long-term performance gains through policy generalization. To this end, the authors propose a framework that integrates DRL with dynamic algorithm configuration, training policies on small, simple instances and transferring them to unseen, complex instances for online parameter adaptation. Experimental results demonstrate that the proposed approach significantly outperforms baseline methods—such as static parameter tuning—on out-of-distribution, complex scenarios. These findings validate the strong generalization capability of DRL policies and confirm that the initial investment in training yields sustained performance benefits in practical applications.

Technology Category

Application Category

📝 Abstract

Deep reinforcement learning (DRL) has recently emerged as a promising tool for Dynamic Algorithm Configuration (DAC), enabling evolutionary algorithms to adapt their parameters online rather than relying on static tuned configurations. While DRL can learn effective control policies, training is computationally expensive. This cost may be justified if learned policies generalize, allowing the training effort to transfer across instance types and problem scales. Yet, for real-world optimization problems, it remains unclear whether this promise holds in practice and under which conditions the investment in learning pays off. In this work, we investigate this question in the context of the carbon-aware permutation flow-shop scheduling problem. We develop a DRL-based DAC framework and train it exclusively on small, simple instances. We then deploy the learned policy on both similar and more complex unseen instances and compare its performance against a static tuned baseline, which provides a fair point of comparison. Our findings show that the proposed method provides a strong dynamic algorithm control policy that can be effectively transferred to different unseen problem instances. Notably, on simple and cheap to compute instances, similar to those observed during training and tuning, DRL performs comparably with the statically tuned baseline. However, as instance characteristics diverge and computational complexities increase, the DRL-learned policy continuously outperforms static tuning. These results confirm that DRL can acquire robust and generalizable control policies which are effective beyond the training instance distributions. This ability to generalize across instance types makes the initial computational investment worthwhile, particularly in settings where static tuning struggles to adapt to changing problem scenarios.

Problem

Research questions and friction points this paper is trying to address.

dynamic algorithm configuration

deep reinforcement learning

carbon-aware scheduling

generalization

computational cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Reinforcement Learning

Dynamic Algorithm Configuration

Generalization