🤖 AI Summary
To address the challenge of fine-grained characterization and cross-application comparison of MPI communication behavior in HPC applications, this paper introduces, for the first time within the Caliper performance profiling framework, a “Communication Region” mechanism. This mechanism annotates MPI call boundaries and associates them with process- and data-level metrics to enable context-aware quantification of communication overhead. Leveraging the Benchpark benchmark suite and the Thicket analysis library, we model and visualize canonical communication patterns—including halo exchanges—across AMG2023, Kripke, and Laghos on both CPU and GPU platforms. Our approach supports quantitative cross-application message volume analysis, scalability divergence attribution, and precise bottleneck identification. It significantly improves the accuracy and comparability of MPI communication behavior analysis. The method has been validated on real-world simulation codes, demonstrating both effectiveness and practical utility.
📝 Abstract
We introduce ``communication regions'' into the widely used Caliper HPC profiling tool. A communication region is an annotation enabling capture of metrics about the data being communicated (including statistics of these metrics), and metrics about the MPI processes involved in the communications, something not previously possible in Caliper. We explore the utility of communication regions with three representative modeling and simulation applications, AMG2023, Kripke, and Laghos, all part of the comprehensive Benchpark suite that includes Caliper annotations. Enhanced Caliper reveals detailed communication behaviors. Using Caliper and Thicket in tandem, we create new visualizations of MPI communication patterns, including halo exchanges. Our findings reveal communication bottlenecks and detailed behaviors, indicating significant utility of the special-regions addition to Caliper. The comparative scaling behavior of both CPU and GPU oriented systems are shown; we are able to look at different regions within a given application, and see how scalability and message-traffic metrics differ.