Terastal: Layer-Variant-based Scheduling for Real-Time Multi-DNN Workloads on Heterogeneous Accelerators

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of rigid scheduling and high deadline miss rates in soft real-time multi-model DNN workloads on heterogeneous accelerators, caused by significant latency disparities across accelerator layers. To tackle this, the authors propose an optimization framework that integrates a novel layer-variant mechanism with cooperative scheduling. The approach combines offline heterogeneous-aware virtual budget allocation and layer-variant design with online real-time scheduling to jointly optimize accelerator mapping and variant selection under timing and accuracy constraints. The key innovation lies in the introduction of the “layer-variant” concept, which effectively narrows cross-accelerator latency gaps. Experimental results demonstrate that, compared to FCFS, EDF, and DREAM, the proposed method reduces average deadline miss rates by 40.58%, 30.53%, and 36.27%, respectively, while incurring only a 2.24% average normalized accuracy loss.

📝 Abstract

Heterogeneous DNN accelerators improve soft real-time multi-DNN execution by mapping each layer to its preferred accelerator to reduce latency. However, under skewed workloads, large layer-latency differences across accelerators limit scheduling flexibility and increase deadline misses. To address this challenge, we introduce layer variants, customized layer implementations that reduce latency gaps on non-preferred accelerators. We then present Terastal, a soft real-time framework for layer-variant design and scheduling on heterogeneous DNN accelerators. Terastal combines offline heterogeneity-aware virtual budget assignment and layer-variant design, and online scheduling to jointly optimize accelerator mapping and variant selection under timing and accuracy constraints. Experimental results show that Terastal reduces deadline miss rate per model by 40.58%, 30.53%, and 36.27% compared with FCFS, EDF, and DREAM, respectively, while incurring only 2.24% average normalized accuracy loss across models with variants.

Problem

Research questions and friction points this paper is trying to address.

heterogeneous accelerators

multi-DNN workloads

layer latency

deadline misses

real-time scheduling

Innovation

Methods, ideas, or system contributions that make the work stand out.

layer variants

heterogeneous accelerators

real-time scheduling