Affinity Tailor: Dynamic Locality-Aware Scheduling at Scale

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

254K/year

🤖 AI Summary

This work addresses the performance degradation caused by frequent workload migration in modern multicore systems, which disrupts microarchitectural locality—particularly in chiplet-based architectures where cross-LLC-domain execution intensifies interference. To mitigate this, the authors propose a user-space-guided kernel scheduling mechanism that prioritizes spatial locality. The approach dynamically allocates compact CPU affinity sets as soft hints by online estimation of CPU demand and explicit awareness of LLC topology, eschewing rigid partitioning or fully shared policies. This strategy significantly enhances locality while maintaining high resource utilization. Empirical evaluation demonstrates throughput improvements of 12% on chiplet systems and 3% on non-chiplet systems, alongside 3%–7% higher per-gigabyte throughput due to reduced memory footprint.

📝 Abstract

Modern large multicore systems often run multiple workloads that share CPUs under schedulers such as Linux CFS. To keep CPUs busy, these schedulers load-balance runnable work, causing each workload to execute on many cores. This weakens locality at the microarchitectural level: workloads lose reuse in caches, branch predictors, and prefetchers, and interfere more with one another - especially on chiplet-based systems, where spreading execution across cores also spreads it across LLC boundaries. A natural alternative is strict CPU partitioning, but hard partitions leave capacity idle when workloads do not fully use their reserved CPUs. We present Affinity Tailor, a userspace-guided kernel scheduling system built on a key insight: the kernel can preserve locality for workloads that share CPUs by treating demand-sized, topologically compact CPU sets as affinity hints rather than hard partitions. A userspace controller estimates each workload's CPU demand online and assigns a preferred CPU set sized to that demand, chosen to be as disjoint as possible from other workloads while spanning as few LLC domains as possible. The kernel then uses this set as an affinity hint, steering threads toward those CPUs while still allowing execution elsewhere when needed to preserve utilization. Deployed at Google, Affinity Tailor delivers geometric-mean per-CPU throughput gains of 12% on chiplet-based systems and 3% on non-chiplet systems over Linux CFS. Furthermore, faster execution reduces memory residency, yielding per-GB throughput gains of 3-7%. Our findings suggest that future schedulers should treat spatial locality as a first-class objective, even at the expense of work-conservation.

Problem

Research questions and friction points this paper is trying to address.

CPU scheduling

spatial locality

chiplet architecture

workload interference

cache locality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Affinity Tailor

locality-aware scheduling

chiplet architecture