CXL Topology-Aware and Expander-Driven Prefetching: Unlocking SSD Performance

📅 2025-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address performance bottlenecks in CXL-interconnected SSDs—namely high latency and excessive CPU overhead from software prefetching—this paper proposes an LLC-offloaded heterogeneous prefetching architecture. Our approach introduces, for the first time, an expander-driven edge-side prefetching engine that jointly leverages CXL multi-level switching topology awareness and a back-invalidation cache coherence protocol to enable low-overhead, high-accuracy localized prefetching. We further establish an end-to-end latency model and a quantitative methodology for evaluating prefetching timeliness. Experimental results demonstrate 9.0× and 14.7× speedups for graph analytics and SPEC CPU benchmarks, respectively—substantially outperforming state-of-the-art CXL-SSD pooling-based prefetching schemes. The architecture reduces dependency on SSD accesses and significantly increases host cache direct-hit rates.

Technology Category

Application Category

📝 Abstract
Integrating compute express link (CXL) with SSDs allows scalable access to large memory but has slower speeds than DRAMs. We present ExPAND, an expander-driven CXL prefetcher that offloads last-level cache (LLC) prefetching from host CPU to CXL-SSDs. ExPAND uses a heterogeneous prediction algorithm for prefetching and ensures data consistency with CXL.mem's back-invalidation. We examine prefetch timeliness for accurate latency estimation. ExPAND, being aware of CXL multi-tiered switching, provides end-to-end latency for each CXL-SSD and precise prefetch timeliness estimations. Our method reduces CXL-SSD reliance and enables direct host cache access for most data. ExPAND enhances graph application performance and SPEC CPU's performance by 9.0$ imes$ and 14.7$ imes$, respectively, surpassing CXL-SSD pools with diverse prefetching strategies.
Problem

Research questions and friction points this paper is trying to address.

Improving SSD performance with CXL prefetching
Reducing host CPU load via CXL-SSD prefetch offloading
Enhancing data consistency in CXL memory systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Expander-driven CXL prefetcher offloads LLC prefetching
Heterogeneous prediction algorithm ensures data consistency
CXL topology-aware for end-to-end latency estimation
🔎 Similar Papers
Dongsuk Oh
Dongsuk Oh
Kyungpook National University, Assistant Professor
Natural Language ProcessingSemantics
M
Miryeong Kwon
Next-Generation Silicon and Research Division, Panmnesia, Inc., Daejeon, South Korea
Jiseon Kim
Jiseon Kim
KAIST
Natural Language ProcessingComputational Social Science
E
Eunjee Na
Next-Generation Silicon and Research Division, Panmnesia, Inc., Daejeon, South Korea
J
Junseok Moon
Next-Generation Silicon and Research Division, Panmnesia, Inc., Daejeon, South Korea
H
Hyunkyu Choi
Next-Generation Silicon and Research Division, Panmnesia, Inc., Daejeon, South Korea
S
Seonghyeon Jang
Next-Generation Silicon and Research Division, Panmnesia, Inc., Daejeon, South Korea
H
Hanjin Choi
Next-Generation Silicon and Research Division, Panmnesia, Inc., Daejeon, South Korea
H
Hongjoo Jung
Next-Generation Silicon and Research Division, Panmnesia, Inc., Daejeon, South Korea
S
Sangwon Lee
Next-Generation Silicon and Research Division, Panmnesia, Inc., Daejeon, South Korea
Myoungsoo Jung
Myoungsoo Jung
The KAIST Endowed Chair Professor | Full Professor, Department of Electrical Engineering, KAIST
Computer ArchitectureSolid State DriveNon-Volatile MemoryCXLOperating Systems