Scaling All-to-All Operations Across Emerging Many-Core Supercomputers

πŸ“… 2025-11-15
πŸ›οΈ SC25-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes a novel architecture-aware collective communication algorithm to address the all-to-all communication bottleneck on emerging many-core supercomputers. By holistically considering message size, process count, node topology, and system partitioning, the algorithm optimizes data scheduling and communication pathways. Evaluated on a 32-node system based on Intel Sapphire Rapids processors, the proposed method achieves up to a 3Γ— speedup over state-of-the-art MPI implementations, significantly enhancing communication efficiency for applications such as fast Fourier transforms, matrix transposition, and machine learning workloads.

Technology Category

Application Category

πŸ“ Abstract
Performant all-to-all collective operations in MPI are critical to fast Fourier transforms, transposition, and machine learning applications. There are many existing implementations for all-to-all exchanges on emerging systems, with the achieved performance dependent on many factors, including message size, process count, architecture, and parallel system partition. This paper presents novel all-to-all algorithms for emerging many-core systems. Further, the paper presents a performance analysis against existing algorithms and system MPI, with novel algorithms achieving up to 3x speedup over system MPI at 32 nodes of state-of-the-art Sapphire Rapids systems.CCS Conceptsβ€’ Computing methodologies β†’ Parallel computing methodologies; Parallel algorithms; Massively parallel algorithms; Concurrent algorithms.
Problem

Research questions and friction points this paper is trying to address.

all-to-all
many-core
supercomputers
MPI
scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

all-to-all
MPI
many-core
performance optimization
collective communication
πŸ”Ž Similar Papers
No similar papers found.
S
S. Kinkead
Sandia National Laboratories, University of New Mexico
J
Jackson Wesley
University of New Mexico
W
W. Schonbein
Sandia National Laboratories
D
David DeBonis
Los Alamos National Laboratory
Matthew G. F. Dosanjh
Matthew G. F. Dosanjh
Sandia National Laboratories
Scalable System Software
Amanda Bienz
Amanda Bienz
Assistant Professor, University of New Mexico
High-Performance ComputingScientific Computing