🤖 AI Summary
This paper addresses the Dynamic Storage Allocation (DSA) problem in static memory planning: optimizing buffer offset assignments—given known buffer sizes and lifetimes—to minimize total memory footprint. Existing approaches either scale efficiently but suffer from high fragmentation, or achieve precision at the cost of poor scalability beyond thousands of buffers. We propose *idealloc*, a novel algorithm that integrates combinatorial optimization modeling, heuristic search pruning, and memory-layout-aware scheduling. *idealloc* is the first method to enable low-fragmentation, robust DSA for workloads with up to one million buffers. In rigorous hardware-based benchmarks, it outperforms four leading production systems across key metrics and achieves top rankings on challenging instances from diverse domains. It reduces peak memory usage substantially, enabling static deployment of models with tens of millions of parameters, and delivers a 3.2× throughput improvement in real-world evaluation.
📝 Abstract
The NP-complete combinatorial optimization task of assigning offsets to a set of buffers with known sizes and lifetimes so as to minimize total memory usage is called dynamic storage allocation (DSA). Existing DSA implementations bypass the theoretical state-of-the-art algorithms in favor of either fast but wasteful heuristics, or memory-efficient approaches that do not scale beyond one thousand buffers. The"AI memory wall", combined with deep neural networks' static architecture, has reignited interest in DSA. We present idealloc, a low-fragmentation, high-performance DSA implementation designed for million-buffer instances. Evaluated on a novel suite of particularly hard benchmarks from several domains, idealloc ranks first against four production implementations in terms of a joint effectiveness/robustness criterion.