🤖 AI Summary
The performance characteristics and architectural behaviors of cache-coherent interconnects—particularly Compute Express Link (CXL)—remain poorly understood in multi-vendor heterogeneous systems (e.g., CPU + CXL memory devices).
Method: We construct a cross-vendor heterogeneous server cluster and propose Heimdall, the first fine-grained memory performance analysis framework tailored for CXL systems, accompanied by a lightweight microbenchmark suite. Through empirical measurement of CXL 3.0 protocol stack–hardware co-behavior, we systematically characterize memory latency, bandwidth, and coherence semantics across mainstream CXL devices.
Contribution/Results: We uncover three previously unknown architectural blind spots and implicit protocol stack constraints. Leveraging these insights, we devise practical, workload-aware memory scheduling strategies for database and AI inference workloads. Our work provides both theoretical foundations and actionable guidelines for designing and optimizing cache-coherent heterogeneous systems.
📝 Abstract
We present a thorough analysis of the use of CXL-based heterogeneous systems. We built a cluster of server systems that combines different vendor's CPUs and various types of CXL devices. We further developed a heterogeneous memory benchmark suite, Heimdall, to profile the performance of such heterogeneous systems. By leveraging Heimdall, we unveiled the detailed architecture design in these systems, drew observations on optimizing performance for workloads, and pointed out directions for future development of CXL-based heterogeneous systems.