CXLAimPod: CXL Memory is all you need in AI era

📅 2025-08-21

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Existing software stacks fail to harness the full-duplex hardware capabilities of CXL memory. To address this, this paper proposes the first adaptive memory scheduling framework explicitly designed for CXL’s full-duplex channels. Implemented atop Linux eBPF, the framework integrates cgroup-based hierarchical hints, dynamic workload awareness, and multi-policy scheduling to enable application-aware, coordinated read/write request scheduling—thereby bridging the semantic gap between hardware capability and software stack. Evaluated on representative data-intensive workloads—including Redis, LLM inference, and vector databases—the framework achieves average bandwidth efficiency improvements of 7.4%–71.6%, with peak gains up to 150% in specific Redis scenarios. This work establishes a scalable, low-overhead hardware-software co-design paradigm for CXL memory systems under AI-era high-concurrency, mixed-read-write workloads.

Technology Category

Application Category

📝 Abstract

The proliferation of data-intensive applications, ranging from large language models to key-value stores, increasingly stresses memory systems with mixed read-write access patterns. Traditional half-duplex architectures such as DDR5 are ill-suited for such workloads, suffering bus turnaround penalties that reduce their effective bandwidth under mixed read-write patterns. Compute Express Link (CXL) offers a breakthrough with its full-duplex channels, yet this architectural potential remains untapped as existing software stacks are oblivious to this capability. This paper introduces CXLAimPod, an adaptive scheduling framework designed to bridge this software-hardware gap through system support, including cgroup-based hints for application-aware optimization. Our characterization quantifies the opportunity, revealing that CXL systems achieve 55-61% bandwidth improvement at balanced read-write ratios compared to flat DDR5 performance, demonstrating the benefits of full-duplex architecture. To realize this potential, the CXLAimPod framework integrates multiple scheduling strategies with a cgroup-based hint mechanism to navigate the trade-offs between throughput, latency, and overhead. Implemented efficiently within the Linux kernel via eBPF, CXLAimPod delivers significant performance improvements over default CXL configurations. Evaluation on diverse workloads shows 7.4% average improvement for Redis (with up to 150% for specific sequential patterns), 71.6% improvement for LLM text generation, and 9.1% for vector databases, demon-strating that duplex-aware scheduling can effectively exploit CXL's architectural advantages.

Problem

Research questions and friction points this paper is trying to address.

Traditional memory architectures suffer bandwidth penalties under mixed read-write workloads

Existing software stacks fail to utilize CXL's full-duplex memory channel capabilities

No adaptive scheduling framework exists to optimize CXL memory for diverse AI workloads

Innovation

Methods, ideas, or system contributions that make the work stand out.

CXL full-duplex channels for mixed read-write workloads

cgroup-based hint mechanism for application-aware optimization

eBPF kernel implementation for adaptive scheduling framework

🔎 Similar Papers

Exploring and Evaluating Real-world CXL: Use Cases and System Adoption