Heterogeneous Memory Benchmarking Toolkit

📅 2025-05-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Accurately characterizing the timing behavior of heterogeneous memories (e.g., PL-side DRAM, BRAM) in embedded systems under configurable contention pressure remains challenging due to measurement noise and limitations of user-space tools. Method: This paper proposes MemScope, a kernel-level heterogeneous memory characterization framework. It introduces a novel kernel-level fine-grained memory control mechanism enabling direct physical memory mapping, dynamic inter-core interference modeling and mitigation, and tightly integrated support for multi-core scheduling coordination, cache coherency maintenance, and joint interrupt/I/O regulation. Contribution/Results: Evaluated on a Xilinx ZCU102 platform, MemScope achieves high consistency (<3% deviation) and nanosecond-level resolution in bandwidth and latency benchmarking across multiple memory types. By significantly suppressing measurement noise and overcoming the precision ceiling of user-mode tools, MemScope establishes a reliable infrastructure for real-time analysis of heterogeneous SoCs.

Technology Category

Application Category

📝 Abstract
This paper presents an open-source kernel-level heterogeneous memory characterization framework (MemScope) for embedded systems that enables users to understand and precisely characterize the temporal behavior of all available memory modules under configurable contention stress scenarios. Since kernel-level provides a high degree of control over allocation, cache maintenance, $CPUs$, interrupts, and I/O device activity, seeking the most accurate way to benchmark heterogeneous memory subsystems, would be achieved by implementing it in the kernel. This gives us the privilege to directly map pieces of contiguous physical memory and instantiate allocators, allowing us to finely control cores to create and eliminate interference. Additionally, we can minimize noise and interruptions, guaranteeing more consistent and precise results compared to equivalent user-space solutions. Running our Framework on a Xilinx Zynq UltraScale+ ZCU102 CPU_FPGA platform, demonstrates its capability to precisely benchmark bandwidth and latency across various memory types, including PL-side DRAM and BRAM, in a multi-core system.
Problem

Research questions and friction points this paper is trying to address.

Characterize temporal behavior of heterogeneous memory modules
Benchmark memory subsystems accurately in kernel-level
Measure bandwidth and latency across diverse memory types
Innovation

Methods, ideas, or system contributions that make the work stand out.

Kernel-level framework for precise memory characterization
Configurable contention stress scenarios for benchmarking
Minimizes noise for consistent multi-core memory analysis
🔎 Similar Papers
No similar papers found.