🤖 AI Summary
To address performance bottlenecks in CXL-interconnected SSDs—namely high latency and excessive CPU overhead from software prefetching—this paper proposes an LLC-offloaded heterogeneous prefetching architecture. Our approach introduces, for the first time, an expander-driven edge-side prefetching engine that jointly leverages CXL multi-level switching topology awareness and a back-invalidation cache coherence protocol to enable low-overhead, high-accuracy localized prefetching. We further establish an end-to-end latency model and a quantitative methodology for evaluating prefetching timeliness. Experimental results demonstrate 9.0× and 14.7× speedups for graph analytics and SPEC CPU benchmarks, respectively—substantially outperforming state-of-the-art CXL-SSD pooling-based prefetching schemes. The architecture reduces dependency on SSD accesses and significantly increases host cache direct-hit rates.
📝 Abstract
Integrating compute express link (CXL) with SSDs allows scalable access to large memory but has slower speeds than DRAMs. We present ExPAND, an expander-driven CXL prefetcher that offloads last-level cache (LLC) prefetching from host CPU to CXL-SSDs. ExPAND uses a heterogeneous prediction algorithm for prefetching and ensures data consistency with CXL.mem's back-invalidation. We examine prefetch timeliness for accurate latency estimation. ExPAND, being aware of CXL multi-tiered switching, provides end-to-end latency for each CXL-SSD and precise prefetch timeliness estimations. Our method reduces CXL-SSD reliance and enables direct host cache access for most data. ExPAND enhances graph application performance and SPEC CPU's performance by 9.0$ imes$ and 14.7$ imes$, respectively, surpassing CXL-SSD pools with diverse prefetching strategies.