Leveraging Caliper and Benchpark to Analyze MPI Communication Patterns: Insights from AMG2023, Kripke, and Laghos

📅 2025-07-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of fine-grained characterization and cross-application comparison of MPI communication behavior in HPC applications, this paper introduces, for the first time within the Caliper performance profiling framework, a “Communication Region” mechanism. This mechanism annotates MPI call boundaries and associates them with process- and data-level metrics to enable context-aware quantification of communication overhead. Leveraging the Benchpark benchmark suite and the Thicket analysis library, we model and visualize canonical communication patterns—including halo exchanges—across AMG2023, Kripke, and Laghos on both CPU and GPU platforms. Our approach supports quantitative cross-application message volume analysis, scalability divergence attribution, and precise bottleneck identification. It significantly improves the accuracy and comparability of MPI communication behavior analysis. The method has been validated on real-world simulation codes, demonstrating both effectiveness and practical utility.

Technology Category

Application Category

📝 Abstract
We introduce ``communication regions'' into the widely used Caliper HPC profiling tool. A communication region is an annotation enabling capture of metrics about the data being communicated (including statistics of these metrics), and metrics about the MPI processes involved in the communications, something not previously possible in Caliper. We explore the utility of communication regions with three representative modeling and simulation applications, AMG2023, Kripke, and Laghos, all part of the comprehensive Benchpark suite that includes Caliper annotations. Enhanced Caliper reveals detailed communication behaviors. Using Caliper and Thicket in tandem, we create new visualizations of MPI communication patterns, including halo exchanges. Our findings reveal communication bottlenecks and detailed behaviors, indicating significant utility of the special-regions addition to Caliper. The comparative scaling behavior of both CPU and GPU oriented systems are shown; we are able to look at different regions within a given application, and see how scalability and message-traffic metrics differ.
Problem

Research questions and friction points this paper is trying to address.

Enhancing Caliper to capture MPI communication metrics and statistics
Analyzing communication patterns in AMG2023, Kripke, and Laghos applications
Identifying communication bottlenecks and scalability differences in HPC systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing communication regions in Caliper
Enhanced profiling with MPI metrics
New visualizations using Caliper and Thicket
🔎 Similar Papers
No similar papers found.
Grace Nansamba
Grace Nansamba
Tennessee Tech
High Performance ComputingFault toleranceSynchronized timeConsensus
E
Evelyn Namugwanya
Department of Computer Science, Tennessee Tech University
D
David Boehme
Center for Applied Scientific Computing, Lawrence Livermore National Laboratory
Dewi Yokelson
Dewi Yokelson
GPU Performance Engineer, AMD
High Performance ComputingPerformance AnalysisAI
R
Riley Shipley
Department of Computer Science, Tennessee Tech University
D
Derek Schafer
Department of Computer Science, University of New Mexico
M
Michael McKinsey
Center for Applied Scientific Computing, Lawrence Livermore National Laboratory
O
Olga Pearce
Center for Applied Scientific Computing, Lawrence Livermore National Laboratory
Anthony Skjellum
Anthony Skjellum
Tennessee Technological University
High Performance ComputingCyberQuantum