A Microbenchmark Framework for Performance Evaluation of OpenMP Target Offloading

📅 2025-03-01

📈 Citations: 0

✨ Influential: 0

career value

258K/year

🤖 AI Summary

This work addresses three key challenges in evaluating OpenMP target offload performance on heterogeneous supercomputing platforms (e.g., Perlmutter, Frontier): (1) difficulty in systematic performance assessment, (2) lack of quantifiable metrics for compiler version impacts, and (3) absence of standardized baselines for comparison against native programming models. To this end, we introduce the first standardized microbenchmarking framework—built on Catch2—specifically designed for OpenMP target offload. The framework supports CUDA and HIP backends, integrates Slurm job scheduling, and incorporates GPU topology awareness. It enables fine-grained, cross-compiler (Clang, GCC, AOCC) and cross-model (vs. native CUDA/HIP) performance attribution. Empirical evaluation across dozens of core operations reveals up to 2–5× performance variation across compilers—particularly in data movement and kernel launch overhead—providing reproducible, scalable, and quantitative insights for both compiler developers and application users.

Technology Category

Application Category

📝 Abstract

We present a framework based on Catch2 to evaluate performance of OpenMP's target offload model via micro-benchmarks. The compilers supporting OpenMP's target offload model for heterogeneous architectures are currently undergoing rapid development. These developments influence performance of various complex applications in different ways. This framework can be employed to track the impact of compiler upgrades and compare their performance with the native programming models. We use the framework to benchmark performance of a few commonly used operations on leadership class supercomputers such as Perlmutter at National Energy Research Scientific Computing (NERSC) Center and Frontier at Oak Ridge Leadership Computing Facility (OLCF). Such a framework will be useful for compiler developers to gain insights into the overall impact of many small changes, as well as for users to decide which compilers and versions are expected to yield best performance for their applications.

Problem

Research questions and friction points this paper is trying to address.

Evaluate OpenMP target offload model performance

Track compiler upgrades impact on applications

Compare compiler performance with native models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework evaluates OpenMP target offload performance.

Uses micro-benchmarks for compiler performance tracking.

Benchmarks operations on supercomputers like Perlmutter, Frontier.

🔎 Similar Papers

No similar papers found.