SpeedMalloc: Improving Multi-threaded Applications via a Lightweight Core for Memory Allocation

πŸ“… 2025-08-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In multithreaded, multicore systems, memory allocators suffer from cache pollution and cross-core synchronization overhead due to interleaving of metadata and user dataβ€”causing up to 2.7Γ— performance variance. Existing hardware acceleration approaches (e.g., Mallacc, Memento) are hindered by poor multithreading support and bottlenecks in core-accelerator synchronization. This paper proposes a novel memory allocation architecture leveraging a lightweight, programmable auxiliary core: allocator metadata is isolated in a dedicated cache, allocation logic is offloaded to the auxiliary core, and an efficient cross-core synchronization mechanism is designed. The architecture is compatible with mainstream allocation algorithms and supports runtime policy updates, eliminating the synchronization rigidity of conventional accelerators. Experimental evaluation under multithreaded workloads demonstrates up to 1.75Γ— speedup over Jemalloc and TCMalloc, and consistently outperforms state-of-the-art software and hardware solutions on average.

Technology Category

Application Category

πŸ“ Abstract
Memory allocation, though constituting only a small portion of the executed code, can have a "butterfly effect" on overall program performance, leading to significant and far-reaching impacts. Despite accounting for just approximately 5% of total instructions, memory allocation can result in up to a 2.7x performance variation depending on the allocator used. This effect arises from the complexity of memory allocation in modern multi-threaded multi-core systems, where allocator metadata becomes intertwined with user data, leading to cache pollution or increased cross-thread synchronization overhead. Offloading memory allocators to accelerators, e.g., Mallacc and Memento, is a potential direction to improve the allocator performance and mitigate cache pollution. However, these accelerators currently have limited support for multi-threaded applications, and synchronization between cores and accelerators remains a significant challenge. We present SpeedMalloc, using a lightweight support-core to process memory allocation tasks in multi-threaded applications. The support-core is a lightweight programmable processor with efficient cross-core data synchronization and houses all allocator metadata in its own caches. This design minimizes cache conflicts with user data and eliminates the need for cross-core metadata synchronization. In addition, using a general-purpose core instead of domain-specific accelerators makes SpeedMalloc capable of adopting new allocator designs. We compare SpeedMalloc with state-of-the-art software and hardware allocators, including Jemalloc, TCMalloc, Mimalloc, Mallacc, and Memento. SpeedMalloc achieves 1.75x, 1.18x, 1.15x, 1.23x, and 1.18x speedups on multithreaded workloads over these five allocators, respectively.
Problem

Research questions and friction points this paper is trying to address.

Optimizing memory allocation performance in multi-threaded applications
Reducing cache pollution caused by allocator metadata interference
Minimizing cross-thread synchronization overhead in memory operations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight core for memory allocation tasks
Efficient cross-core data synchronization design
General-purpose core avoids domain-specific limitations
πŸ”Ž Similar Papers
No similar papers found.
R
Ruihao Li
The University of Texas at Austin, USA
Q
Qinzhe Wu
University of Texas at Austin, USA
K
Krishna Kavi
University of North Texas, USA
G
Gayatri Mehta
University of North Texas, USA
Jonathan C. Beard
Jonathan C. Beard
System Architect
Networks on ChipMemory SystemsSystem ArchitectureRuntimesDistributed Computing
Neeraja J. Yadwadkar
Neeraja J. Yadwadkar
Assistant Professor, University of Texas at Austin
Networked SystemsCloud ComputingMachine Learning
L
Lizy K. John
The University of Texas at Austin, USA