🤖 AI Summary
To address high energy consumption, lack of simulation environments, scarce operational data, and stringent safety constraints in data center cooling systems, this paper proposes a physics-informed offline reinforcement learning framework. Our method introduces a novel graph neural network architecture enforcing time-reversal symmetry to ensure physical consistency and enable robust policy learning in latent space; it integrates physics-based prior embedding, joint state-action representation learning, and closed-loop deployment on real data center infrastructure. Evaluated over 2000 hours of real-world operation in a large production data center, the framework achieves 14–21% reduction in cooling energy consumption while maintaining zero safety violations throughout. This work overcomes key limitations of offline RL in industrial control—namely poor generalization and insufficient safety guarantees—and demonstrates the feasibility and efficacy of safe, energy-efficient optimization under severe data constraints.
📝 Abstract
The recent advances in information technology and artificial intelligence have fueled a rapid expansion of the data center (DC) industry worldwide, accompanied by an immense appetite for electricity to power the DCs. In a typical DC, around 30~40% of the energy is spent on the cooling system rather than on computer servers, posing a pressing need for developing new energy-saving optimization technologies for DC cooling systems. However, optimizing such real-world industrial systems faces numerous challenges, including but not limited to a lack of reliable simulation environments, limited historical data, and stringent safety and control robustness requirements. In this work, we present a novel physics-informed offline reinforcement learning (RL) framework for energy efficiency optimization of DC cooling systems. The proposed framework models the complex dynamical patterns and physical dependencies inside a server room using a purposely designed graph neural network architecture that is compliant with the fundamental time-reversal symmetry. Because of its well-behaved and generalizable state-action representations, the model enables sample-efficient and robust latent space offline policy learning using limited real-world operational data. Our framework has been successfully deployed and verified in a large-scale production DC for closed-loop control of its air-cooling units (ACUs). We conducted a total of 2000 hours of short and long-term experiments in the production DC environment. The results show that our method achieves 14~21% energy savings in the DC cooling system, without any violation of the safety or operational constraints. Our results have demonstrated the significant potential of offline RL in solving a broad range of data-limited, safety-critical real-world industrial control problems.