Achieving Consistent and Comparable CPU Evaluation Outcomes

📅 2024-11-13
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing CPU benchmarks (e.g., SPEC CPU2017) lack explicit system configuration specifications, leading to performance interference from non-CPU components and severely undermining comparability, consistency, and reproducibility. Method: We propose a novel CPU performance evaluation paradigm grounded in the principle of “fully specified and valid configurations,” establishing a systematic modeling framework that spans the complete configuration space; we design an unbiased sampling strategy that uniformly weights all compliant configurations; and we replace point estimates with confidence intervals and associated confidence levels for performance reporting. Results: Experiments reveal up to 74.49× performance variation for the same CPU across compliant configurations. Our framework eliminates configuration ambiguity entirely, enabling fair cross-CPU comparisons and significantly improving consistency, reproducibility, and statistical rigor of benchmarking outcomes.

Technology Category

Application Category

📝 Abstract
The SPEC CPU2017 benchmark suite is an industry standard for accessing CPU performance. It adheres strictly to some workload and system configurations - arbitrary specificity - while leaving other system configurations undefined - arbitrary ambiguity. This article reveals: (1) Arbitrary specificity proves not meaningful, obscuring many scenarios, as evidenced by significant performance variations, a 74.49x performance difference observed on the same CPU. (2) Arbitrary ambiguity is unfair as it fails to establish the same configurations for comparing different CPUs. We propose an innovative CPU evaluation methodology. It considers all workload and system configurations valid and mandates each configuration to be well-defined to avoid arbitrary specificity and ambiguity. To reduce the evaluation cost, a sampling approach is proposed to select a subset of the configurations. To expose CPU performance under different scenarios, it treats all outcomes under each configuration as equally important. Finally, it utilizes confidence level and confidence interval to report the outcomes to avoid bias.
Problem

Research questions and friction points this paper is trying to address.

Difficulty attributing CPU performance deviations accurately
Uncontrolled variability in industry-standard CPU benchmarks
Lack of consistent methodologies for CPU evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rigorous CPU evaluation methodology for consistency
Controlled experiments to reduce variability
Comparison with SPEC CPU2017 and others
🔎 Similar Papers
No similar papers found.
C
Chenxi Wang
L
Lei Wang
Wanling Gao
Wanling Gao
Institute Of Computing Technology Chinese Academy Of Sciences
Big data and AI benchmarking,Computer architecture
Y
Yikang Yang
Y
Yutong Zhou
J
Jianfeng Zhan