High-Order Spectral Element Methods for Wave Propagation on ARM Multicore CPU with SME: Optimizations and Implications

📅 2026-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the inefficiency of conventional spectral element methods (SEM) in exploiting the Scalable Matrix Extension (SME) architecture on emerging ARM multicore CPUs such as the LX2, which limits performance in wave propagation simulations. We present the first integration of SME into SPECFEM3D, introducing SME-aware batched small matrix kernels and a memory-conscious MPI+OpenMP hybrid parallelization strategy, complemented by dispersion analysis for iso-accuracy (h,p)-refinement modeling. Our results demonstrate that SME not only substantially enhances kernel efficiency but also reshapes the accuracy–performance trade-off at high-order discretizations, making higher polynomial degrees more advantageous. At fixed polynomial order, the full application achieves a 4–6× speedup over the original implementation, outperforming non-SME baselines while significantly reducing both solution time and working set size.
📝 Abstract
Wave propagation based on the spectral element method (SEM) is a representative HPC workload, but existing SEM implementations are not well matched to emerging ARM multicore CPUs with Scalable Matrix Extension (SME). We present an SME-enabled optimization of \textsc{SPECFEM3D} on the emerging LX2 processor that combines an SME-aware batched small-matrix kernel for SEM tensor-product operators, a memory-aware hybrid MPI+OpenMP execution scheme for limited-HBM systems, and a dispersion-based iso-accuracy study of the $(h,p)$ tradeoff. At fixed polynomial order, the optimized implementation improves full-application performance by 4--6$\times$ over the original code and delivers clear gains over optimized non-SME CPU baselines. Beyond these implementation-level gains, our results suggest that SME shifts the performance-favorable operating point toward higher polynomial orders along the dispersion-based iso-accuracy frontier, further reducing time-to-solution and working-set size. These results indicate that SME affects not only kernel efficiency, but also the practical discretization tradeoff for SEM on modern ARM multicore platforms.
Problem

Research questions and friction points this paper is trying to address.

spectral element method
wave propagation
ARM multicore CPU
Scalable Matrix Extension
high-performance computing
Innovation

Methods, ideas, or system contributions that make the work stand out.

SME
spectral element method
wave propagation
ARM multicore
tensor-product operator
🔎 Similar Papers
No similar papers found.