Persistent and Partitioned MPI for Stencil Communication

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

In large-scale parallel iterative stencil computations, communication overhead dominates performance. This paper systematically investigates the synergistic optimization of persistent and segmented MPI for stencil communication. We propose a unified optimization framework integrating non-blocking, persistent, and segmented communication primitives, and— for the first time—quantitatively characterize the impact of process count, thread count, and message size on segmented communication performance. Using the Comb benchmark, we conduct multi-scale empirical evaluation: persistent MPI achieves up to 37% speedup, segmented MPI up to 68%, and their synergy further alleviates synchronization bottlenecks, significantly improving communication efficiency. Our work establishes a reproducible methodology and empirically grounded guidelines for communication optimization in stencil-based applications.

Technology Category

Application Category

📝 Abstract

Many parallel applications rely on iterative stencil operations, whose performance are dominated by communication costs at large scales. Several MPI optimizations, such as persistent and partitioned communication, reduce overheads and improve communication efficiency through amortized setup costs and reduced synchronization of threaded sends. This paper presents the performance of stencil communication in the Comb benchmarking suite when using non blocking, persistent, and partitioned communication routines. The impact of each optimization is analyzed at various scales. Further, the paper presents an analysis of the impact of process count, thread count, and message size on partitioned communication routines. Measured timings show that persistent MPI communication can provide a speedup of up to 37% over the baseline MPI communication, and partitioned MPI communication can provide a speedup of up to 68%.

Problem

Research questions and friction points this paper is trying to address.

Optimizing MPI communication for iterative stencil operations

Analyzing performance impact of persistent and partitioned routines

Evaluating speedup from MPI optimizations at various scales

Innovation

Methods, ideas, or system contributions that make the work stand out.

Persistent MPI communication reduces overheads

Partitioned MPI communication improves efficiency

Analyzes impact of process and thread counts

🔎 Similar Papers

A shared compilation stack for distributed-memory parallelism in stencil DSLs