🤖 AI Summary
This work addresses the lack of systematic, reproducible performance and energy-efficiency evaluation mechanisms in existing CI/CD pipelines for exascale heterogeneous high-performance computing systems. To bridge this gap, the paper introduces exaCB, a novel framework that pioneers an incremental, continuous benchmarking methodology. exaCB enables cross-application, full-lifecycle tracking of performance and energy efficiency through reusable CI/CD components, standardized benchmark suites, a unified reporting protocol, and a maturity-level classification scheme. The framework has been successfully deployed in the JURECA Early Access Program, supporting over 70 scientific applications and effectively facilitating large-scale reproducible evaluations, performance regression detection, and long-term energy-efficiency analysis.
📝 Abstract
The increasing heterogeneity of high-performance computing (HPC) systems and the transition to exascale architectures require systematic and reproducible performance evaluation across diverse workloads. While continuous integration (CI) ensures functional correctness in software engineering, performance and energy efficiency in HPC are typically evaluated outside CI workflows, motivating continuous benchmarking (CB) as a complementary approach. Integrating benchmarking into CI workflows enables reproducible evaluation, early detection of regressions, and continuous validation throughout the software development lifecycle.
We present exaCB, a framework for continuous benchmarking developed in the context of the JUPITER exascale system. exaCB enables application teams to integrate benchmarking into their workflows while supporting large-scale, system-wide studies through reusable CI/CD components, established harnesses, and a shared reporting protocol. The framework supports incremental adoption, allowing benchmarks to be onboarded easily and to evolve from basic runnability to more advanced instrumentation and reproducibility. The approach is demonstrated in JUREAP, the early-access program for JUPITER, where exaCB enabled continuous benchmarking of over 70 applications at varying maturity levels, supporting cross-application analysis, performance tracking, and energy-aware studies. These results illustrate the practicality using exaCB for continuous benchmarking for exascale HPC systems across large, diverse collections of scientific applications.