🤖 AI Summary
This work addresses the challenge of static resource allocation in production-grade HPC clusters, which struggles to accommodate the time-varying resource demands of scientific applications. To overcome this limitation, the authors propose a non-intrusive MPI malleability approach built upon a dynamic resource management (DRM) framework and leveraging the standard MPI malleability API. The method enables transparent, runtime resource adaptation without requiring modifications to either application code or the underlying scheduler. As the first such solution deployed in real-world production environments, it maintains compatibility with mainstream HPC software stacks and resource managers. Validated on three TOP500 supercomputers, the approach achieves performance comparable to static allocation while substantially reducing node-hours for equivalent workloads, thereby significantly lowering the barrier to adopting elastic scheduling in HPC systems.
📝 Abstract
Many large-scale scientific applications exhibit time-varying behavior, yet production HPC clusters still rely on rigid, fixed-size allocations, and most dynamic techniques remain confined to laboratory prototypes. This work presents a practical MPI malleability methodology that integrates with state-of-the-art high-performance computing (HPC) software stacks and operational practices. The methodology is implemented in the Dynamic Management of Resources (DMR) framework and is designed to ease adoption by existing applications without requiring intrusive code changes or scheduler modifications. We evaluate our approach by integrating the DMR API into two large-scale scientific applications and deploying them on three TOP500 supercomputers under realistic production configurations. Our non-invasive malleability solution achieves performance comparable to static baselines in controlled environments while substantially reducing node-hour consumption for identical workloads. These results show that malleability can be effectively exploited on production systems using vanilla resource managers, lowering the barrier to adoption of dynamic resource management in HPC.