Persistent and Partitioned MPI for Stencil Communication

📅 2025-08-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In large-scale parallel iterative stencil computations, communication overhead dominates performance. This paper systematically investigates the synergistic optimization of persistent and segmented MPI for stencil communication. We propose a unified optimization framework integrating non-blocking, persistent, and segmented communication primitives, and— for the first time—quantitatively characterize the impact of process count, thread count, and message size on segmented communication performance. Using the Comb benchmark, we conduct multi-scale empirical evaluation: persistent MPI achieves up to 37% speedup, segmented MPI up to 68%, and their synergy further alleviates synchronization bottlenecks, significantly improving communication efficiency. Our work establishes a reproducible methodology and empirically grounded guidelines for communication optimization in stencil-based applications.

Technology Category

Application Category

📝 Abstract
Many parallel applications rely on iterative stencil operations, whose performance are dominated by communication costs at large scales. Several MPI optimizations, such as persistent and partitioned communication, reduce overheads and improve communication efficiency through amortized setup costs and reduced synchronization of threaded sends. This paper presents the performance of stencil communication in the Comb benchmarking suite when using non blocking, persistent, and partitioned communication routines. The impact of each optimization is analyzed at various scales. Further, the paper presents an analysis of the impact of process count, thread count, and message size on partitioned communication routines. Measured timings show that persistent MPI communication can provide a speedup of up to 37% over the baseline MPI communication, and partitioned MPI communication can provide a speedup of up to 68%.
Problem

Research questions and friction points this paper is trying to address.

Optimizing MPI communication for iterative stencil operations
Analyzing performance impact of persistent and partitioned routines
Evaluating speedup from MPI optimizations at various scales
Innovation

Methods, ideas, or system contributions that make the work stand out.

Persistent MPI communication reduces overheads
Partitioned MPI communication improves efficiency
Analyzes impact of process and thread counts
🔎 Similar Papers
No similar papers found.
G
Gerald Collom
Department of Computer Science, University of New Mexico
J
Jason Burmark
Lawrence Livermore National Lab
O
Olga Pearce
Lawrence Livermore National Lab
Amanda Bienz
Amanda Bienz
Assistant Professor, University of New Mexico
High-Performance ComputingScientific Computing