KernelCraft: Benchmarking for Agentic Close-to-Metal Kernel Generation on Emerging Hardware

📅 2026-02-10
📈 Citations: 3
Influential: 0
📄 PDF

career value

223K/year
🤖 AI Summary
This work addresses the challenge of prolonged and error-prone manual kernel development for emerging AI accelerators, which stems from their use of specialized instruction set architectures (ISAs) and hinders cross-platform portability. To overcome this, the paper introduces the first agent-driven benchmark for kernel generation tailored to novel hardware, featuring a large language model (LLM)-based feedback optimization framework. This framework leverages function calling and iterative refinement to automatically synthesize efficient and correct low-level kernels. Evaluation across more than twenty machine learning tasks on three distinct emerging accelerators demonstrates that the approach rapidly generates high-performance kernel code—often matching or surpassing compiler-generated baselines—even for previously unseen ISAs, thereby significantly accelerating the hardware development cycle.
📝 Abstract
New AI accelerators with novel instruction set architectures (ISAs) often require developers to manually craft low-level kernels -- a time-consuming, laborious, and error-prone process that cannot scale across diverse hardware targets. This prevents emerging hardware platforms from reaching the market efficiently. While prior LLM-based code generation has shown promise in mature GPU ecosystems, it remains unclear whether agentic LLM systems can quickly produce valid and efficient kernels for emerging hardware with new ISAs. We present KernelCraft: the first benchmark to evaluate an LLM agent's ability to generate and optimize low-level kernels for customized accelerators via a function-calling, feedback-driven workflow. Within KernelCraft, the agent refines kernels under ISA and hardware constraints using automated feedback derived from compilation checks, simulation, and correctness validation against ground truth. In our experiments, we assess agent performance across three emerging accelerator platforms on more than 20 ML tasks, each with 5 diverse task configurations, with special evaluation of task configuration complexity. Across four leading reasoning models, top agents produce functionally valid kernels for previously unseen ISAs within a few refinement steps, with optimized kernels that match or outperform template-based compiler baselines. With that, we demonstrate the potential for reducing the cost of kernel development for accelerator designers and kernel developers.
Problem

Research questions and friction points this paper is trying to address.

emerging hardware
instruction set architecture
kernel generation
AI accelerators
low-level programming
Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic LLM
kernel generation
emerging hardware
instruction set architecture (ISA)
feedback-driven workflow
🔎 Similar Papers
No similar papers found.
J
Jiayi Nie
Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
H
Haoran Wu
Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
Yao Lai
Yao Lai
HKU | UT Austin
Zeyu Cao
Zeyu Cao
University of Cambridge
Cheng Zhang
Cheng Zhang
Imperial College London
AI AccelerationEfficient Machine Learning
B
Binglei Lou
Department of Electrical and Electronic Engineering, Imperial College London, London, United Kingdom
Erwei Wang
Erwei Wang
AMD
FPGADeep Neural Network
Jianyi Cheng
Jianyi Cheng
University of Edinburgh
high-level synthesiscomputer architectureformal methodsmachine learninghardware security
Timothy M. Jones
Timothy M. Jones
University of Cambridge
CompilersMicroarchitectureParallelismReliability
Robert Mullins
Robert Mullins
Department of Computer Science and Technology, University of Cambridge
Computer Science - Computer Architecture - On-Chip Interconnection Networks
R
Rika Antonova
Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
Yiren Zhao
Yiren Zhao
University of Toronto
Computer NetworksOptical NetworksDatacenter Networks