RIMMS: Runtime Integrated Memory Management System for Heterogeneous Computing

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Heterogeneous computing faces memory management challenges due to dynamic task mapping, with existing approaches relying on explicit data control or static assumptions—compromising portability and scalability. This paper proposes HeteroMem, a lightweight, hardware-agnostic runtime memory abstraction layer that transparently enables cross-CPU/GPU/FPGA data location tracking, cache coherence maintenance, and dynamic memory allocation—without modifying application code or requiring platform-specific tuning. Its core innovation is the decoupling of memory semantics from execution scheduling, ensuring native compatibility with mainstream heterogeneous runtimes. Evaluated on radar signal processing workloads, HeteroMem achieves 2.43× speedup over baseline systems and 3.08× over IRIS, approaching native CUDA performance. Each API invocation incurs only 1–2 CPU cycles of overhead, delivering high performance, strong portability, and minimal runtime cost.

Technology Category

Application Category

📝 Abstract
Efficient memory management in heterogeneous systems is increasingly challenging due to diverse compute architectures (e.g., CPU, GPU, FPGA) and dynamic task mappings not known at compile time. Existing approaches often require programmers to manage data placement and transfers explicitly, or assume static mappings that limit portability and scalability. This paper introduces RIMMS (Runtime Integrated Memory Management System), a lightweight, runtime-managed, hardware-agnostic memory abstraction layer that decouples application development from low-level memory operations. RIMMS transparently tracks data locations, manages consistency, and supports efficient memory allocation across heterogeneous compute elements without requiring platform-specific tuning or code modifications. We integrate RIMMS into a baseline runtime and evaluate with complete radar signal processing applications across CPU+GPU and CPU+FPGA platforms. RIMMS delivers up to 2.43X speedup on GPU-based and 1.82X on FPGA-based systems over the baseline. Compared to IRIS, a recent heterogeneous runtime system, RIMMS achieves up to 3.08X speedup and matches the performance of native CUDA implementations while significantly reducing programming complexity. Despite operating at a higher abstraction level, RIMMS incurs only 1-2 cycles of overhead per memory management call, making it a low-cost solution. These results demonstrate RIMMS's ability to deliver high performance and enhanced programmer productivity in dynamic, real-world heterogeneous environments.
Problem

Research questions and friction points this paper is trying to address.

Efficient memory management in diverse heterogeneous computing systems
Decoupling application development from low-level memory operations
Reducing programming complexity while maintaining high performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight runtime-managed memory abstraction layer
Transparent data tracking and consistency management
Low-overhead heterogeneous memory allocation
S
Serhan Gener
University of Arizona, Tucson, Arizona, USA
A
Aditya Ukarande
University of Wisconsin - Madison, Madison, Wisconsin, USA
S
Shilpa Mysore Srinivasa Murthy
University of Wisconsin - Madison, Madison, Wisconsin, USA
S
Sahil Hassan
University of Arizona, Tucson, Arizona, USA
J
Joshua Mack
University of Arizona, Tucson, Arizona, USA
Chaitali Chakrabarti
Chaitali Chakrabarti
Professor of Electrical Engineering, SECEE, Arizona State University
VLSI architectures for signal processinglow power embedded systems
U
Umit Ogras
University of Wisconsin - Madison, Madison, Wisconsin, USA
Ali Akoglu
Ali Akoglu
University of Arizona
reconfigurable computinghigh performance computingneuromorphic computingdomain specific architectures