Scaling All-to-All Operations Across Emerging Many-Core Supercomputers

πŸ“… 2025-11-15
πŸ›οΈ SC25-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF

career value

204K/year
πŸ€– AI Summary
This work proposes a novel architecture-aware collective communication algorithm to address the all-to-all communication bottleneck on emerging many-core supercomputers. By holistically considering message size, process count, node topology, and system partitioning, the algorithm optimizes data scheduling and communication pathways. Evaluated on a 32-node system based on Intel Sapphire Rapids processors, the proposed method achieves up to a 3Γ— speedup over state-of-the-art MPI implementations, significantly enhancing communication efficiency for applications such as fast Fourier transforms, matrix transposition, and machine learning workloads.

Technology Category

Application Category

πŸ“ Abstract
Performant all-to-all collective operations in MPI are critical to fast Fourier transforms, transposition, and machine learning applications. There are many existing implementations for all-to-all exchanges on emerging systems, with the achieved performance dependent on many factors, including message size, process count, architecture, and parallel system partition. This paper presents novel all-to-all algorithms for emerging many-core systems. Further, the paper presents a performance analysis against existing algorithms and system MPI, with novel algorithms achieving up to 3x speedup over system MPI at 32 nodes of state-of-the-art Sapphire Rapids systems.CCS Conceptsβ€’ Computing methodologies β†’ Parallel computing methodologies; Parallel algorithms; Massively parallel algorithms; Concurrent algorithms.
Problem

Research questions and friction points this paper is trying to address.

all-to-all
many-core
supercomputers
MPI
scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

all-to-all
MPI
many-core
performance optimization
collective communication
πŸ”Ž Similar Papers
No similar papers found.