The Hitchhiker's Guide to Programming and Optimizing CXL-Based Heterogeneous Systems

📅 2024-11-05

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 1

career value

213K/year

🤖 AI Summary

The performance characteristics and architectural behaviors of cache-coherent interconnects—particularly Compute Express Link (CXL)—remain poorly understood in multi-vendor heterogeneous systems (e.g., CPU + CXL memory devices). Method: We construct a cross-vendor heterogeneous server cluster and propose Heimdall, the first fine-grained memory performance analysis framework tailored for CXL systems, accompanied by a lightweight microbenchmark suite. Through empirical measurement of CXL 3.0 protocol stack–hardware co-behavior, we systematically characterize memory latency, bandwidth, and coherence semantics across mainstream CXL devices. Contribution/Results: We uncover three previously unknown architectural blind spots and implicit protocol stack constraints. Leveraging these insights, we devise practical, workload-aware memory scheduling strategies for database and AI inference workloads. Our work provides both theoretical foundations and actionable guidelines for designing and optimizing cache-coherent heterogeneous systems.

Technology Category

Application Category

📝 Abstract

We present a thorough analysis of the use of CXL-based heterogeneous systems. We built a cluster of server systems that combines different vendor's CPUs and various types of CXL devices. We further developed a heterogeneous memory benchmark suite, Heimdall, to profile the performance of such heterogeneous systems. By leveraging Heimdall, we unveiled the detailed architecture design in these systems, drew observations on optimizing performance for workloads, and pointed out directions for future development of CXL-based heterogeneous systems.

Problem

Research questions and friction points this paper is trying to address.

Analyze performance of cache-coherent heterogeneous systems

Compare CXL, NVLink-C2C, and Infinity Fabric interconnects

Optimize workloads for future heterogeneous system designs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzed cache-coherent links: CXL, NVLink-C2C, Infinity Fabric

Developed Heimdall benchmark for heterogeneous memory profiling

Unveiled architecture designs for performance optimization

🔎 Similar Papers

No similar papers found.