🤖 AI Summary
AI data centers face emerging challenges in thermal management, including low accuracy, slow simulation speed, and heavy reliance on empirical knowledge. Method: This study proposes a full-stack optimization framework based on Physics-Informed Artificial Intelligence (PhyAI). It introduces a novel five-layer digital twin architecture integrating an industrial-grade thermal-fluid simulation engine, a PhysicsNemo-driven physics-informed machine learning (PIML) model, and the NVIDIA Omniverse collaborative platform to enable real-time, closed-loop digital operations—from modeling and simulation to control. Contribution/Results: We develop a high-fidelity, real-time thermal-fluid surrogate model achieving a median absolute temperature error of only 0.18°C and accelerating simulation by over 100× compared to conventional CFD/heat transfer (HT) methods. Moreover, this work pioneers the deep integration of PIML into the data center digital twin ecosystem, shifting from experience-driven to physics-guided operation—enhancing service elasticity and reducing total cost of ownership (TCO).
📝 Abstract
Data centers (DCs) as mission-critical infrastructures are pivotal in powering the growth of artificial intelligence (AI) and the digital economy. The evolution from Internet DC to AI DC has introduced new challenges in operating and managing data centers for improved business resilience and reduced total cost of ownership. As a result, new paradigms, beyond the traditional approaches based on best practices, must be in order for future data centers. In this research, we propose and develop a novel Physical AI (PhyAI) framework for advancing DC operations and management. Our system leverages the emerging capabilities of state-of-the-art industrial products and our in-house research and development. Specifically, it presents three core modules, namely: 1) an industry-grade in-house simulation engine to simulate DC operations in a highly accurate manner, 2) an AI engine built upon NVIDIA PhysicsNemo for the training and evaluation of physics-informed machine learning (PIML) models, and 3) a digital twin platform built upon NVIDIA Omniverse for our proposed 5-tier digital twin framework. This system presents a scalable and adaptable solution to digitalize, optimize, and automate future data center operations and management, by enabling real-time digital twins for future data centers. To illustrate its effectiveness, we present a compelling case study on building a surrogate model for predicting the thermal and airflow profiles of a large-scale DC in a real-time manner. Our results demonstrate its superior performance over traditional time-consuming Computational Fluid Dynamics/Heat Transfer (CFD/HT) simulation, with a median absolute temperature prediction error of 0.18 {deg}C. This emerging approach would open doors to several potential research directions for advancing Physical AI in future DC operations.