A Comprehensive Simulation Framework for CXL Disaggregated Memory

📅 2024-11-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of high-fidelity, verifiable system-level simulation tools for CXL-based heterogeneous memory systems, this paper introduces CXL-DMSim—the first open-source, full-system CXL memory-disaggregation simulator operating near gem5 speed. Its key contributions are: (1) a novel CXL disaggregation simulation framework supporting NUMA-aware kernel memory management; (2) an integrated, self-developed CXL.io/mem protocol stack, device drivers, and a flexible memory expansion model; and (3) dual-mode support (application- and kernel-managed) with fine-grained runtime observability. Validated against FPGA and ASIC hardware prototypes, CXL-DMSim achieves a mean error of only 3.4%. Experimental results show that CXL-ASIC memory latency is 2.18× that of DDR, while bandwidth reaches 82–83% of peak. Memory-intensive applications—including Viper and MERCI—achieve respective performance improvements of 23× and 60×.

Technology Category

Application Category

📝 Abstract
Compute eXpress Link (CXL) has emerged as a key enabler of memory disaggregation for future heterogeneous computing systems to expand memory on-demand and improve resource utilization. However, CXL is still in its infancy stage and lacks commodity products on the market, thus necessitating a reliable system-level simulation tool for research and development. In this paper, we propose CXL-DMSim, an open-source full-system simulator to simulate CXL disaggregated memory systems with high fidelity at a gem5-comparable simulation speed. CXL-DMSim incorporates a flexible CXL memory expander model along with its associated device driver, and CXL protocol support with CXL.io and CXL.mem. It can operate in both app-managed mode and kernel-managed mode, with the latter using a dedicated NUMA-compatible mechanism. The simulator has been rigorously verified against a real hardware testbed with both FPGA- and ASIC-based CXL memory devices, which demonstrates the qualification of CXL-DMSim in simulating the characteristics of various CXL memory devices at an average simulation error of 3.4%. The experimental results using LMbench and STREAM benchmarks suggest that the CXL-FPGA memory exhibits a ~2.88x higher latency than local DDR while the CXL-ASIC latency is ~2.18x; CXL-FPGA achieves 45-69% of local DDR memory bandwidth, whereas the number for CXL-ASIC is 82-83%. The study also reveals that CXL memory can significantly enhance the performance of memory-intensive applications, improved by 23x at most with limited local memory for Viper key-value database and approximately 60% in memory-bandwidth-sensitive scenarios such as MERCI. Moreover, the simulator's observability and expandability are showcased with detailed case-studies, highlighting its great potential for research on future CXL-interconnected hybrid memory pool.
Problem

Research questions and friction points this paper is trying to address.

Develops CXL-DMSim for simulating CXL disaggregated memory systems.
Addresses lack of reliable simulation tools for CXL memory research.
Evaluates performance of CXL memory in heterogeneous computing systems.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source full-system simulator for CXL memory
Supports CXLio and CXLmem protocols
Verified with FPGA and ASIC CXL devices
🔎 Similar Papers
No similar papers found.
Y
Yanjing Wang
National University of Defense Technology
Lizhou Wu
Lizhou Wu
National University of Defense Technology, China
Spintronic Design and TestMemory SystemsEmerging Computing Paradigms
W
Wentao Hong
National University of Defense Technology
Y
Yang Ou
National University of Defense Technology
Z
Zicong Wang
National University of Defense Technology
S
Sunfeng Gao
National University of Defense Technology
J
Jie Zhang
Peking University
S
Sheng Ma
National University of Defense Technology
Dezun Dong
Dezun Dong
Professor, School of Computer Science, National University of Defense Technology
computer architecturehigh performance computinginterconnection networksmachine learning systems
X
Xingyun Qi
National University of Defense Technology
M
Ming-Shiang Lai
National University of Defense Technology
N
Nong Xiao
National University of Defense Technology