π€ AI Summary
In cloud-native environments, infrastructure services (e.g., service meshes, monitoring agents) co-locate with user applications on shared host resources, causing performance degradation and scalability bottlenecks. To address this, we propose HeteroPodβa novel abstraction that offloads infrastructure containers to data processing units (DPUs) for hardware-enforced resource isolation. We design HeteroNet, a cross-PU (XPU) networking system enabling unified CPU-DPU network namespace and zero-copy elastic communication. Furthermore, we present the first fully automated, optimal DPU offloading framework for unmodified, million-line-scale commercial cloud-native applications. Implemented via Linux kernel extensions, a customized Kubernetes distribution (HeteroK8s), and NVIDIA BlueField-2 DPUs, our approach reduces end-to-end latency by 60%, cuts resource consumption by 64Γ, improves throughput latency by 31.9Γ, and enhances scalability by 55%, while maintaining full compatibility with complex production-grade workloads.
π Abstract
Cloud-native systems increasingly rely on infrastructure services (e.g., service meshes, monitoring agents), which compete for resources with user applications, degrading performance and scalability. We propose HeteroPod, a new abstraction that offloads these services to Data Processing Units (DPUs) to enforce strict isolation while reducing host resource contention and operational costs. To realize HeteroPod, we introduce HeteroNet, a cross-PU (XPU) network system featuring: (1) split network namespace, a unified network abstraction for processes spanning CPU and DPU, and (2) elastic and efficient XPU networking, a communication mechanism achieving shared-memory performance without pinned resource overhead and polling costs. By leveraging HeteroNet and the compositional nature of cloud-native workloads, HeteroPod can optimally offload infrastructure containers to DPUs. We implement HeteroNet based on Linux, and implement a cloud-native system called HeteroK8s based on Kubernetes. We evaluate the systems using NVIDIA Bluefield-2 DPUs and CXL-based DPUs (simulated with real CXL memory devices). The results show that HeteroK8s effectively supports complex (unmodified) commodity cloud-native applications (up to 1 million LoC) and provides up to 31.9x better latency and 64x less resource consumption (compared with kernel-bypass design), 60% better end-to-end latency, and 55% higher scalability compared with SOTA systems.