MLKAPS: Machine Learning and Adaptive Sampling for HPC Kernel Auto-tuning

📅 2025-01-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Manual hyperparameter tuning of compute kernels in HPC libraries is time-consuming, inefficient, and lacks transparency. Method: We propose a lightweight, interpretable auto-tuning framework that integrates Bayesian-inspired adaptive sampling with ensemble gradient-boosted decision trees to build an input-aware runtime performance predictor and configuration selector. Contribution/Results: This is the first work to combine adaptive sampling with interpretable decision trees for HPC kernel tuning—achieving high accuracy, low overhead, and debuggability. It identifies and corrects blind spots in expert heuristics and scales effectively to large input sizes and high-dimensional design spaces. Evaluated on Intel MKL’s dgetrf and dgeqrf, our method improves performance for over 85% of inputs, achieving geometric mean speedups of 1.30× and 1.18×, respectively—outperforming state-of-the-art tools in both tuning efficiency and accuracy.

Technology Category

Application Category

📝 Abstract
Many High-Performance Computing (HPC) libraries rely on decision trees to select the best kernel hyperparameters at runtime,depending on the input and environment. However, finding optimized configurations for each input and environment is challengingand requires significant manual effort and computational resources. This paper presents MLKAPS, a tool that automates this task usingmachine learning and adaptive sampling techniques. MLKAPS generates decision trees that tune HPC kernels' design parameters toachieve efficient performance for any user input. MLKAPS scales to large input and design spaces, outperforming similar state-of-the-artauto-tuning tools in tuning time and mean speedup. We demonstrate the benefits of MLKAPS on the highly optimized Intel MKLdgetrf LU kernel and show that MLKAPS finds blindspots in the manual tuning of HPC experts. It improves over 85% of the inputswith a geomean speedup of x1.30. On the Intel MKL dgeqrf QR kernel, MLKAPS improves performance on 85% of the inputs with ageomean speedup of x1.18.
Problem

Research questions and friction points this paper is trying to address.

High-Performance Computing
Kernel Parameter Optimization
Automation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine Learning
Adaptive Sampling
High Performance Computing
🔎 Similar Papers
No similar papers found.
M
Mathys Jam
Université Paris-Saclay, UVSQ, LI-PaRAD, France
Eric Petit
Eric Petit
Research engineer at Intel Corporation
Applied AI for softwarecomputer arithmeticparallel programming and algorithms
P
P. D. O. Castro
Université Paris-Saclay, UVSQ, LI-PaRAD, France
D
D. Defour
Université de Perpignan via Domitia, UPVD, LAMPS, France
G
Greg Henry
Intel Corp., USA
W
W. Jalby
Université Paris-Saclay, UVSQ, LI-PaRAD, France