Leveraging Multi-Instance GPUs through moldable task scheduling

📅 2025-07-17

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This paper addresses the makespan minimization problem for multi-task workloads on Multi-Instance GPU (MIG) accelerators. We propose FAR, a pliability-aware task scheduling framework that abandons conventional monotonicity assumptions. FAR comprises three stages: (i) a heuristic initial schedule built upon a re-partitioning tree; (ii) dynamic resource reconfiguration supporting inter-batch resource reuse; and (iii) fine-grained optimization via local search. The method explicitly models task pliability and integrates Longest-Processing-Time-First (LPT) with list scheduling, while being compatible with MIG-capable GPUs (e.g., A30, A100, H100). Experimental evaluation shows that, under zero reconfiguration overhead, FAR achieves solutions within 1.22× (and as low as 1.10× for synthetic workloads) of the optimal makespan—substantially outperforming state-of-the-art schedulers. To our knowledge, this is the first systematic study to empirically validate the efficacy and practical feasibility of MIG’s elastic resource partitioning for real-world multi-task scheduling.

Technology Category

Application Category

📝 Abstract

NVIDIA MIG (Multi-Instance GPU) allows partitioning a physical GPU into multiple logical instances with fully-isolated resources, which can be dynamically reconfigured. This work highlights the untapped potential of MIG through moldable task scheduling with dynamic reconfigurations. Specifically, we propose a makespan minimization problem for multi-task execution under MIG constraints. Our profiling shows that assuming monotonicity in task work with respect to resources is not viable, as is usual in multicore scheduling. Relying on a state-of-the-art proposal that does not require such an assumption, we present FAR, a 3-phase algorithm to solve the problem. Phase 1 of FAR builds on a classical task moldability method, phase 2 combines Longest Processing Time First and List Scheduling with a novel repartitioning tree heuristic tailored to MIG constraints, and phase 3 employs local search via task moves and swaps. FAR schedules tasks in batches offline, concatenating their schedules on the fly in an improved way that favors resource reuse. Excluding reconfiguration costs, the List Scheduling proof shows an approximation factor of 7/4 on the NVIDIA A30 model. We adapt the technique to the particular constraints of an NVIDIA A100/H100 to obtain an approximation factor of 2. Including the reconfiguration cost, our real-world experiments reveal a makespan with respect to the optimum no worse than 1.22x for a well-known suite of benchmarks, and 1.10x for synthetic inputs inspired by real kernels. We obtain good experimental results for each batch of tasks, but also in the concatenation of batches, with large improvements over the state-of-the-art and proposals without GPU reconfiguration. Beyond the algorithm, the paper demonstrates the research potential of the MIG technology and suggests useful metrics, workload characterizations and evaluation techniques for future work in this field.

Problem

Research questions and friction points this paper is trying to address.

Minimizes makespan for multi-task execution under MIG constraints

Addresses non-monotonic task work in GPU resource allocation

Optimizes dynamic GPU reconfiguration for improved scheduling efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Moldable task scheduling with dynamic GPU reconfigurations

FAR algorithm combines LPT and List Scheduling

Optimizes makespan via batch scheduling and local search

🔎 Similar Papers

No similar papers found.