Shared-Memory Hierarchical Process Mapping

πŸ“… 2025-04-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses the process mapping problem for communication-intensive tasks in ultra-large-scale scientific computing on hierarchical supercomputers (island/rack/node/processor). Traditional approaches ignore communication topology and thus perform poorly on sparse, predictable communication patterns. To overcome this limitation, the authors propose a novel optimization method that jointly minimizes communication overhead and ensures computational load balance. Specifically, they introduce the first shared-memory–based hierarchical multi-level parallel graph partitioning algorithm, enabling topology-aware co-optimization of communication models and hardware hierarchy. Experimental results demonstrate that the parallel version achieves optimal solutions for 95% of test instances, outperforming suboptimal algorithms in both solution quality and runtime. The serial variant surpasses all existing serial methods in both solution quality and execution speed.

Technology Category

Application Category

πŸ“ Abstract
Modern large-scale scientific applications consist of thousands to millions of individual tasks. These tasks involve not only computation but also communication with one another. Typically, the communication pattern between tasks is sparse and can be determined in advance. Such applications are executed on supercomputers, which are often organized in a hierarchical hardware topology, consisting of islands, racks, nodes, and processors, where processing elements reside. To ensure efficient workload distribution, tasks must be allocated to processing elements in a way that ensures balanced utilization. However, this approach optimizes only the workload, not the communication cost of the application. It is straightforward to see that placing groups of tasks that frequently exchange large amounts of data on processing elements located near each other is beneficial. The problem of mapping tasks to processing elements considering optimization goals is called process mapping. In this work, we focus on minimizing communication cost while evenly distributing work. We present the first shared-memory algorithm that utilizes hierarchical multisection to partition the communication model across processing elements. Our parallel approach achieves the best solution on 95 percent of instances while also being marginally faster than the next best algorithm. Even in a serial setting, it delivers the best solution quality while also outperforming previous serial algorithms in speed.
Problem

Research questions and friction points this paper is trying to address.

Optimize task mapping to minimize communication costs
Balance workload distribution across hierarchical hardware
Develop shared-memory algorithm for hierarchical process mapping
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shared-memory hierarchical process mapping algorithm
Hierarchical multisection for communication partitioning
Parallel approach with superior solution quality
πŸ”Ž Similar Papers
No similar papers found.