RoCE BALBOA: Service-enhanced Data Center RDMA for SmartNICs

📅 2025-07-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Datacenter networks have become a performance bottleneck for data-intensive applications such as machine learning, and existing RDMA protocols struggle to meet the stringent requirements of SmartNICs and accelerators. This paper proposes BALBOA: a service-enhanced RDMA architecture tailored for SmartNICs. Its core is an open-source, customizable RoCE v2–compatible protocol stack capable of supporting hundreds of queue pairs and sustaining 100 Gbps line-rate processing. BALBOA innovatively integrates host-bypass, pipelined compute offloading, and in-network computation, unifying protocol processing and data acceleration on FPGA. It further extends functionality with hardware-accelerated encryption, ML packet classification, and line-rate data preprocessing. Experimental evaluation on an FPGA-based cluster demonstrates that BALBOA achieves latency and throughput comparable to commercial NICs, enabling end-to-end line-rate offloading for recommendation systems. Results validate its high performance, ultra-low latency, and adaptability across diverse application scenarios.

Technology Category

Application Category

📝 Abstract
Data-intensive applications in data centers, especially machine learning (ML), have made the network a bottleneck, which in turn has motivated the development of more efficient network protocols and infrastructure. For instance, remote direct memory access (RDMA) has become the standard protocol for data transport in the cloud as it minimizes data copies and reduces CPU-utilization via host-bypassing. Similarly, an increasing amount of network functions and infrastructure have moved to accelerators, SmartNICs, and in-network computing to bypass the CPU. In this paper we explore the implementation and deployment of RoCE BALBOA, an open-source, RoCE v2-compatible, scalable up to hundreds of queue-pairs, and 100G-capable RDMA-stack that can be used as the basis for building accelerators and smartNICs. RoCE BALBOA is customizable, opening up a design space and offering a degree of adaptability not available in commercial products. We have deployed BALBOA in a cluster using FPGAs and show that it has latency and performance characteristics comparable to commercial NICs. We demonstrate its potential by exploring two classes of use cases. One involves enhancements to the protocol for infrastructure purposes (encryption, deep packet inspection using ML). The other showcases the ability to perform line-rate compute offloads with deep pipelines by implementing commercial data preprocessing pipelines for recommender systems that process the data as it arrives from the network before transferring it directly to the GPU. These examples demonstrate how BALBOA enables the exploration and development of SmartNICs and accelerators operating on network data streams.
Problem

Research questions and friction points this paper is trying to address.

Addressing network bottlenecks in data-intensive applications like ML
Developing scalable RDMA-stack for SmartNICs and accelerators
Enabling customizable network functions and compute offloads
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source RoCE v2-compatible RDMA-stack
Customizable scalable SmartNIC solution
Line-rate compute offloads with deep pipelines
🔎 Similar Papers
No similar papers found.