🤖 AI Summary
This study addresses the lack of a unified framework for fairly evaluating continuous multi-modal scheduling (CMMS) algorithms under multidimensional service-level objectives (SLOs) in heterogeneous edge clusters. The authors propose the first open-source benchmark platform, featuring a standardized controller interface, a closed-loop multi-modal workload generator, and an innovative dual-metric SLO scoring mechanism that distinguishes between raw and steady-state SLOs. Using this platform, they systematically evaluate multiple scheduling algorithms across 424 diverse scenarios. Their experiments reveal that scheduler performance rankings are highly sensitive to cluster configurations and load intensities: deep reinforcement learning–based schedulers achieve superior performance under light loads but suffer a nearly 29-percentage-point degradation under heavy loads, accompanied by decision-making overhead approximately 500 times higher, thereby challenging assumptions about algorithmic generality.
📝 Abstract
Modern Artificial Intelligence (AI) workloads deployed across the heterogeneous tiers of an edge--cloud continuum must satisfy multi-dimensional Service Level Objectives (SLOs) over latency, throughput, and output quality. For each incoming task, the scheduler picks both a target node and a processing mode (e.g., full or reduced inference precision). We call this class of problems \emph{Continuous Multi-Mode Scheduling} (CMMS). Comparing CMMS algorithms fairly is difficult because prior studies typically evaluate each controller in its own stack, under a single workload, and without reporting per-decision overhead. To close these gaps, we present an open source benchmark platform that features (i) a unified controller interface, (ii) a closed-loop workload driver covering multiple workload patterns, and (iii) dual-metric SLO scoring that reports raw SLO (overall compliance) and steady-state SLO (compliance during stable operation) separately. Running six controllers across five cluster configurations and two load regimes (424 episodes), we find that controller rankings are strongly configuration-dependent: a deep reinforcement-learning winner under light workloads loses to a rule-based heuristic by nearly 29 percentage points once load intensifies, at roughly 500$\times$ the per-decision operational overhead. We further show that separating raw from steady-state SLOs exposes switching costs that a single aggregate score would otherwise conflate.