Leveraging Caliper and Benchpark to Analyze MPI Communication Patterns: Insights from AMG2023, Kripke, and Laghos

📅 2025-07-30

📈 Citations: 0

✨ Influential: 0

career value

265K/year

🤖 AI Summary

To address the challenge of fine-grained characterization and cross-application comparison of MPI communication behavior in HPC applications, this paper introduces, for the first time within the Caliper performance profiling framework, a “Communication Region” mechanism. This mechanism annotates MPI call boundaries and associates them with process- and data-level metrics to enable context-aware quantification of communication overhead. Leveraging the Benchpark benchmark suite and the Thicket analysis library, we model and visualize canonical communication patterns—including halo exchanges—across AMG2023, Kripke, and Laghos on both CPU and GPU platforms. Our approach supports quantitative cross-application message volume analysis, scalability divergence attribution, and precise bottleneck identification. It significantly improves the accuracy and comparability of MPI communication behavior analysis. The method has been validated on real-world simulation codes, demonstrating both effectiveness and practical utility.

Technology Category

Application Category

📝 Abstract

We introduce ``communication regions'' into the widely used Caliper HPC profiling tool. A communication region is an annotation enabling capture of metrics about the data being communicated (including statistics of these metrics), and metrics about the MPI processes involved in the communications, something not previously possible in Caliper. We explore the utility of communication regions with three representative modeling and simulation applications, AMG2023, Kripke, and Laghos, all part of the comprehensive Benchpark suite that includes Caliper annotations. Enhanced Caliper reveals detailed communication behaviors. Using Caliper and Thicket in tandem, we create new visualizations of MPI communication patterns, including halo exchanges. Our findings reveal communication bottlenecks and detailed behaviors, indicating significant utility of the special-regions addition to Caliper. The comparative scaling behavior of both CPU and GPU oriented systems are shown; we are able to look at different regions within a given application, and see how scalability and message-traffic metrics differ.

Problem

Research questions and friction points this paper is trying to address.

Enhancing Caliper to capture MPI communication metrics and statistics

Analyzing communication patterns in AMG2023, Kripke, and Laghos applications

Identifying communication bottlenecks and scalability differences in HPC systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing communication regions in Caliper

Enhanced profiling with MPI metrics

New visualizations using Caliper and Thicket

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

AI/HPC System Performance Engineer