🤖 AI Summary
To address two critical challenges in deep learning training—low GPU utilization and memory-overflow crashes or cross-task resource interference caused by task co-location—this paper proposes GPUMemNet, a GPU memory-aware scheduling framework designed for task-level co-location. Its core contributions include: (1) a machine learning–based fine-grained GPU memory consumption prediction model; (2) a dynamic upper-bound control mechanism for GPU utilization; and (3) a lightweight task-recovery fault-tolerance strategy. By jointly optimizing memory allocation and utilization caps, GPUMemNet enhances system robustness while significantly improving resource service quality and energy efficiency. Evaluation on real-world training task traces demonstrates that GPUMemNet increases temporal GPU utilization by 39.3%, reduces end-to-end training time by 26.7%, and lowers energy consumption by 14.2%.
📝 Abstract
Studies conducted on enterprise-scale infrastructure have shown that GPUs -- the core computational resource for deep learning (DL) training -- are often significantly underutilized. DL task collocation on GPUs is an opportunity to address this challenge. However, it may result in (1) out-of-memory crashes for the subsequently arriving task and (2) slowdowns for all tasks sharing the GPU due to resource interference. The former challenge poses a threat to robustness, while the latter affects the quality of service and energy efficiency.
We propose CARMA, a server-scale task-level collocation-aware resource management system that handles both collocation challenges. CARMA encompasses GPUMemNet, a novel ML-based GPU memory estimator framework for DL training tasks, to minimize out-of-memory errors and introduces collocation policies that cap GPU utilization to minimize interference. Furthermore, CARMA introduces a recovery method to ensure robust restart of tasks that crash. Our evaluation on traces modeled after real-world DL training task traces shows that CARMA increases the GPU utilization over time by 39.3%, decreases the end-to-end execution time by $sim$26.7%, and reduces the GPU energy use by $sim$14.2%.