🤖 AI Summary
This work addresses the challenge of dynamic power allocation in high-density data centers under hierarchical power domains and multi-tenant contractual constraints. The authors propose a three-stage hybrid quadratic programming (QP)/linear programming (LP) optimization framework that sequentially fulfills device power requests, fairly distributes surplus power among active devices, and then allocates residual capacity to idle devices—balancing priority enforcement with fairness. Notably, this approach is the first to integrate hierarchical power topology and tenant contracts into a dynamic scheduling model, substantially improving resource utilization. In large-scale GPU data center simulations, the method achieves an average allocation latency of 264.69 milliseconds and a request satisfaction rate of 98.92%, significantly outperforming static equal-split and greedy proportional allocation strategies.
📝 Abstract
Power oversubscription is increasingly central to datacenter operation as power density grows, making it necessary to dynamically allocate limited power budgets across devices based on real-time demand. Existing approaches typically assume flat power domains, whereas in practice power distribution is hierarchical and allocation decisions must additionally respect tenant-level contractual constraints. We present nvPAX, a constrained-optimization policy that computes feasible power allocations at every control step via a three-phase hybrid QP/LP procedure. Phase I allocates power with minimum deviation from each device's power request, while respecting job priorities. Phase II fairly distributes excess power among active devices. Phase III fairly distributes any remaining power to idle devices. The rationale behind the three phases is to allow power oversubscription while maximizing datacenter utilization. On a trace-driven large-scale simulation using GPU power telemetry from a production datacenter, nvPAX runs with a mean wall-clock time of 264.69 ms per allocation interval and achieves a mean satisfaction ratio of 98.92%, outperforming static equal-share allocation and providing robustness beyond greedy proportional allocation in the presence of non-uniform hierarchical bottlenecks.