Employ SmartNICs' Data Path Accelerators for Ordered Key-Value Stores

📅 2026-01-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes the first stateless ordered key-value store leveraging BlueField-3 SmartNICs’ Data Path Accelerators (DPA) to simultaneously achieve high throughput, low latency, architectural simplicity, and native support for ordered operations—challenges that existing remote memory key-value systems struggle to balance. By maintaining a lock-free learned index in DPA-local memory, the system directly processes network requests and defers host access to minimize PCIe overhead. It further integrates a cooperative mechanism between the on-NIC learned index and a host-resident tree replica, employs OS-bypass networking, utilizes an NIC-resident read cache, and adopts batched write migration. The resulting system delivers 33 million point queries and 13 million range queries per second, matching or surpassing the performance of state-of-the-art solutions.

Technology Category

Application Category

📝 Abstract
Remote in-memory key-value (KV) stores serve as a cornerstone for diverse modern workloads, and high-speed range scans are frequently a requirement. However, current architectures rarely achieve a simultaneous balance of peak efficiency, architectural simplicity, and native support for ordered operations. Conventional host-centric frameworks are restricted by kernel-space network stacks and internal bus latencies. While hash-based alternatives that utilize OS-bypass or run natively on SmartNICs offer high throughput, they lack the data structures necessary for range queries. Distributed RDMA-based systems provide performance and range functionality but often depend on stateful clients, which introduces complexity in scaling and error handling. Alternatively, SmartNIC implementations that traverse trees located in host memory are hampered by high DMA round-trip latencies. This paper introduces a KV store that leverages the on-path Data Path Accelerators (DPAs) of the BlueField-3 SmartNIC to eliminate operating system overhead while facilitating stateless clients and range operations. These DPAs ingest network requests directly from NIC buffers to navigate a lock-free learned index residing in the accelerator's local memory. By deferring value retrieval from the host-side tree replica until the leaf level is reached, the design minimizes PCIe crossings. Write operations are staged in DPA memory and migrated in batches to the host, where structural maintenance is performed before being transactionally stitched back to the SmartNIC. Coupled with a NIC-resident read cache, the system achieves 33 million operations per second (MOPS) for point lookups and 13 MOPS for range queries. Our analysis demonstrates that this architecture matches or exceeds the performance of contemporary state-of-the-art solutions, while we identify hardware refinements that could further accelerate performance.
Problem

Research questions and friction points this paper is trying to address.

ordered key-value store
range queries
SmartNIC
data path accelerators
remote in-memory storage
Innovation

Methods, ideas, or system contributions that make the work stand out.

SmartNIC
Data Path Accelerator
Learned Index
Ordered Key-Value Store
OS-bypass
🔎 Similar Papers
No similar papers found.
F
Frederic Schimmelpfennig
Johannes Gutenberg University Mainz
J
Jan Sass
Johannes Gutenberg University Mainz
R
Reza Salkhordeh
Johannes Gutenberg University Mainz
M
Martin Kröning
RWTH Aachen University
Stefan Lankes
Stefan Lankes
RWTH Aachen University
Operating SystemsCloud ComputingHigh Performance Computing
André Brinkmann
André Brinkmann
Professor of Computer Science, Johannes Gutenberg University Mainz
Storage SystemsOperating SystemsHPCCloud Computing