Rethinking Inter-Process Communication with Memory Operation Offloading

๐Ÿ“… 2026-01-09
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the excessive CPU overhead incurred by frequent memory copies in existing inter-process communication (IPC) runtimes during large-scale data exchange for multimodal and AI services, as well as the absence of a unified hardware-software co-designed offload mechanism. The paper proposes the first unified IPC runtime that elevates memory operation offloading to a general-purpose system capability. By integrating asynchronous pipelining, selective cache injection, and a hybrid coordination mechanism, it jointly manages synchronization, cache visibility, and concurrency in shared-memory communication, enabling flexible trade-offs among throughput, latency, and CPU efficiency. Experimental results on real-world workloads demonstrate up to a 22% reduction in instruction count, a 2.1ร— improvement in throughput, and as much as a 72% decrease in latency.

Technology Category

Application Category

๐Ÿ“ Abstract
As multimodal and AI-driven services exchange hundreds of megabytes per request, existing IPC runtimes spend a growing share of CPU cycles on memory copies. Although both hardware and software mechanisms are exploring memory offloading, current IPC stacks lack a unified runtime model to coordinate them effectively. This paper presents a unified IPC runtime suite that integrates both hardware- and software-based memory offloading into shared-memory communication. The system characterizes the interaction between offload strategies and IPC execution, including synchronization, cache visibility, and concurrency, and introduces multiple IPC modes that balance throughput, latency, and CPU efficiency. Through asynchronous pipelining, selective cache injection, and hybrid coordination, the system turns offloading from a device-specific feature into a general system capability. Evaluations on real-world workloads show instruction count reductions of up to 22%, throughput improvements of up to 2.1x, and latency reductions of up to 72%, demonstrating that coordinated IPC offloading can deliver tangible end-to-end efficiency gains in modern data-intensive systems.
Problem

Research questions and friction points this paper is trying to address.

Inter-Process Communication
Memory Offloading
CPU Efficiency
Shared-Memory Communication
Data-Intensive Systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

memory offloading
inter-process communication
unified runtime
asynchronous pipelining
cache injection
๐Ÿ”Ž Similar Papers
No similar papers found.
M
M. Park
Georgia Institute of Technology
R
Richi Dubey
Georgia Institute of Technology
Y
Yifan Yuan
Meta
N
Nam Sung Kim
University of Illinoisโ€“Urbana-Champaign
Ada Gavrilovska
Ada Gavrilovska
Georgia Institute of Technology