ARM SVE Unleashed: Performance and Insights Across HPC Applications on Nvidia Grace

📅 2025-05-14

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

ARM Scalable Vector Extension (SVE) lacks a quantitative evaluation framework for vectorization efficiency in high-performance computing (HPC) applications on NVIDIA Grace platforms, hindering production readiness assessment. Method: We propose a length-element-coupled enhanced Roofline model, introduce the first quantitative metric for SVE vectorization benefit, and develop the first decision-tree classifier to assess HPC application acceleration potential on SVE. Using SVE code analysis, performance monitoring unit (PMU) event sampling, and analytical modeling, we precisely identify vectorization bottlenecks. Contribution/Results: Experiments across representative HPC workloads show an average 37% reduction in total instruction count and up to 2.1× speedup in critical kernels. The study validates SVE’s production readiness for HPC deployment on Grace and establishes a reusable methodology for ARM architecture–based high-performance optimization.

Technology Category

Application Category

📝 Abstract

Vector architectures are essential for boosting computing throughput. ARM provides SVE as the next-generation length-agnostic vector extension beyond traditional fixed-length SIMD. This work provides a first study of the maturity and readiness of exploiting ARM and SVE in HPC. Using selected performance hardware events on the ARM Grace processor and analytical models, we derive new metrics to quantify the effectiveness of exploiting SVE vectorization to reduce executed instructions and improve performance speedup. We further propose an adapted roofline model that combines vector length and data elements to identify potential performance bottlenecks. Finally, we propose a decision tree for classifying the SVE-boosted performance in applications.

Problem

Research questions and friction points this paper is trying to address.

Evaluating ARM SVE maturity for HPC applications

Quantifying SVE vectorization impact on instruction reduction

Identifying performance bottlenecks via adapted roofline model

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes ARM SVE for HPC vectorization efficiency

Introduces metrics for SVE instruction reduction

Proposes adapted roofline model for bottlenecks

🔎 Similar Papers

No similar papers found.