🤖 AI Summary
This study addresses the low utilization of modern GPU computing resources by systematically evaluating the performance, energy efficiency, and resource isolation characteristics of NVIDIA’s Multi-Process Service (MPS) and Multi-Instance GPU (MIG) technologies under concurrent application workloads. The experiments reveal a critical trade-off between MPS’s scheduling flexibility and MIG’s hardware-level isolation: MPS can improve performance by up to 30% and reduce energy consumption by approximately 20% in the absence of memory contention, yet suffers a 30% performance degradation under contention; MIG effectively mitigates resource contention but is constrained by its rigid configuration options and higher overhead. These findings provide empirical foundations for optimizing GPU co-execution strategies driven by application-specific workload characteristics.
📝 Abstract
To mitigate the increasingly common underutilization of computational resources in modern GPUs, spatial sharing methods enable multiple applications to use them simultaneously. This work presents a comprehensive evaluation of NVIDIA's primary technologies to achieve that goal: Multi-Process Service (MPS) and Multi-Instance GPU (MIG). Our findings reveal a crucial trade-off between MPS's flexibility and MIG's isolation, and provide many key insights for improving the co-execution strategy according to job profiles. In the most favorable scenarios, MPS improves performance by up to 30% and reduces energy by about 20%, using its provisioning option to avoid resource monopolization. However, under memory contention, it suffers severe degradation, worsening performance by around 30%. Conversely, MIG's full hardware isolation resolves memory contention, leading to more consistent improvements, but these gains are tempered by higher overhead, and its rigid scheme can degrade performance in certain cases.