ARCAS: Adaptive Runtime System for Chiplet-Aware Scheduling

📅 2025-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address intra-chiplet memory access imbalance and inefficient task scheduling caused by partitioned L3 caches in chiplet-based CPUs, this paper proposes a lightweight adaptive runtime system that jointly optimizes task scheduling, memory allocation, and performance monitoring. Our approach introduces a chiplet-aware fine-grained task migration mechanism and a hardware-topology-aware memory allocation strategy—overcoming the limitations of conventional NUMA optimizations in chiplet architectures. It integrates chiplet-aware heuristic scheduling, a user-space lightweight concurrency model (supporting suspension/resumption and cross-chiplet task migration), and real-time performance monitoring. Experimental evaluation across diverse memory-intensive parallel applications demonstrates an average 1.7× speedup, a 22% improvement in L3 cache hit rate, and a 35% reduction in cross-chiplet memory access latency.

Technology Category

Application Category

📝 Abstract
The growing disparity between CPU core counts and available memory bandwidth has intensified memory contention in servers. This particularly affects highly parallelizable applications, which must achieve efficient cache utilization to maintain performance as CPU core counts grow. Optimizing cache utilization, however, is complex for recent chiplet-based CPUs, whose partitioned L3 caches lead to varying latencies and bandwidths, even within a single NUMA domain. Classical NUMA optimizations and task scheduling approaches unfortunately fail to address the performance issues of chiplet-based CPUs. We describe Adaptive Runtime system for Chiplet-Aware Scheduling (ARCAS), a new runtime system designed for chiplet-based CPUs. ARCAS combines chiplet-aware task scheduling heuristics, hardware-aware memory allocation, and fine-grained performance monitoring to optimize workload execution. It implements a lightweight concurrency model that combines user-level thread features-such as individual stacks, per-task scheduling, and state management-with coroutine-like behavior, allowing tasks to suspend and resume execution at defined points while efficiently managing task migration across chiplets. Our evaluation across diverse scenarios shows ARCAS's effectiveness for optimizing the performance of memory-intensive parallel applications.
Problem

Research questions and friction points this paper is trying to address.

Address memory contention in chiplet-based CPUs
Optimize cache utilization for parallel applications
Improve task scheduling across NUMA domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chiplet-aware task scheduling heuristics
Hardware-aware memory allocation
Fine-grained performance monitoring
🔎 Similar Papers
No similar papers found.
A
Alessandro Fogli
Imperial College London, London, United Kingdom
B
Bo Zhao
Aalto University, Espoo, Finland
P
Peter R. Pietzuch
Imperial College London, London, United Kingdom
Jana Giceva
Jana Giceva
TU Munich
Systems support for data processing on modern and future hardware.