Pushing the Memory Bandwidth Wall with CXL-enabled Idle I/O Bandwidth Harvesting

📅 2025-11-15

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

Rising server CPU core counts have intensified memory bandwidth demand, yet per-core memory bandwidth continues to decline due to physical constraints on off-chip pin count and signaling rate scalability. Concurrently, fixed, static allocation of memory and I/O pins fragments bandwidth resources, hindering coordinated and efficient utilization. To address these challenges, this paper proposes SURGE—a novel architecture enabling software-controllable, dynamic sharing of memory and I/O bandwidth for the first time. Leveraging the CXL interconnect protocol, SURGE integrates hardware interface resource sharing, dynamic bandwidth multiplexing, and fine-grained traffic scheduling to eliminate traditional bandwidth silos. Evaluated on bandwidth-constrained servers, SURGE delivers up to a 1.3× speedup for memory-intensive workloads and significantly improves system-wide bandwidth utilization.

Technology Category

Application Category

📝 Abstract

The continual increase of cores on server-grade CPUs raises demands on memory systems, which are constrained by limited off-chip pin and data transfer rate scalability. As a result, high-end processors typically feature lower memory bandwidth per core, at the detriment of memory-intensive workloads. We propose alleviating this challenge by improving the utility of the CPU's limited pins. In a typical CPU design process, the available pins are apportioned between memory and I/O traffic, each accounting for about half of the total off-chip bandwidth availability. Consequently, unless both memory and I/O are simultaneously highly utilized, such fragmentation leads to underutilization of the valuable off-chip bandwidth resources. An ideal architecture would offer I/O and memory bandwidth fungibility, allowing use of the aggregate off-chip bandwidth in the form required by each workload. In this work, we introduce SURGE, a software-supported architectural technique that boosts memory bandwidth availability by salvaging idle I/O bandwidth resources. SURGE leverages the capability of versatile interconnect technologies like CXL to dynamically multiplex memory and I/O traffic over the same processor interface. We demonstrate that SURGE-enhanced architectures can accelerate memory-intensive workloads on bandwidth-constrained servers by up to 1.3x.

Problem

Research questions and friction points this paper is trying to address.

Addressing memory bandwidth constraints in multi-core servers

Harvesting idle I/O bandwidth to improve memory performance

Enabling dynamic bandwidth sharing between memory and I/O

Innovation

Methods, ideas, or system contributions that make the work stand out.

Harvesting idle I/O bandwidth via CXL technology

Dynamically multiplexing memory and I/O traffic

Software-supported architecture boosting memory bandwidth availability

🔎 Similar Papers

Exploring and Evaluating Real-world CXL: Use Cases and System Adoption