π€ AI Summary
Existing scheduling theory struggles to handle multi-resource job scenarios with continuously distributed resource demands, as it relies on the assumption of finitely many job typesβa simplification inconsistent with the high heterogeneity observed in real-world workloads. This work proposes the first family of throughput-optimal scheduling policies for continuous multi-resource job models, encompassing both preemptive and non-preemptive variants. The approach employs an adaptive discretization mechanism that dynamically adjusts granularity based on system load and demand distribution. By integrating throughput-optimal control, distribution-aware scheduling, and queueing optimization, the method achieves theoretical optimality while substantially improving computational efficiency. Experiments demonstrate superior performance over state-of-the-art index-based policies under both parametric distributions and real-world Google Borg traces, attaining industry-leading results.
π Abstract
Modern computing systems process jobs with resource requirements such as CPU and memory, which are described by multiresource jobs (MRJ) queueing models. In practice, job resource requirements are spread out over so many values, that it is rare to see the same value twice. This pattern is best modeled by a continuous distribution of requirement values. However, the existing theoretical work on stability or throughput-optimality focuses on queueing models with class-based resource requirements. In class-based models, the number of distinct resource requirements must be small to demonstrate strong empirical performance, making them a poor match for these practical systems.
We introduce the first throughput-optimal family of scheduling policies for the continuous MRJ model, with both preemptive and nonpreemptive variants. We further introduce several efficient policy families, which remain throughput-optimal while considerably improving computational efficiency, under some distributional assumptions. We use a discretization approach, where we choose the discretization granularity based on the system load and the distribution of resource requirements. We validate the real-world applicability of our policies by comparing them against existing index-based policies on parametrized distributions and on datacenter trace data from the Google Borg scheduler, demonstrating state-of-the-art performance.