Autocomp: LLM-Driven Code Optimization for Tensor Accelerators

📅 2025-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of programming tensor accelerators and underutilization of hardware performance, this paper proposes an LLM-driven automated code optimization framework. Methodologically, it introduces a novel structured two-stage prompting paradigm—comprising planning and generation—integrated with a plug-and-play optimization menu and a hardware-closed-loop feedback mechanism, enabling cross-operator strategy reuse. The approach synergistically combines domain-specific knowledge modeling, compiler scheduling, tensor computation abstractions, and advanced LLM prompt engineering. Evaluated on GEMM, convolution, and fine-grained linear algebra workloads, the framework achieves up to 5.6× speedup over vendor libraries and 1.4× over expert-tuned code; strategy reuse improves sampling efficiency, yielding an additional 24% acceleration. The core contribution is the first LLM-compiler-hardware co-optimization paradigm specifically designed for tensor accelerators.

Technology Category

Application Category

📝 Abstract
Hardware accelerators, especially those designed for tensor processing, have become ubiquitous in today's computing landscape. However, even with significant efforts in building compilers, programming these tensor accelerators remains challenging, leaving much of their potential underutilized. Recently, large language models (LLMs), trained on large amounts of code, have shown significant promise in code generation and optimization tasks, but generating low-resource languages like specialized tensor accelerator code still poses a significant challenge. We tackle this challenge with Autocomp, an approach that empowers accelerator programmers to leverage domain knowledge and hardware feedback to optimize code via an automated LLM-driven search. We accomplish this by: 1) formulating each optimization pass as a structured two-phase prompt, divided into planning and code generation phases, 2) inserting domain knowledge during planning via a concise and adaptable optimization menu, and 3) integrating correctness and performance metrics from hardware as feedback at each search iteration. Across three categories of representative workloads and two different accelerators, we demonstrate that Autocomp-optimized code runs 5.6x (GEMM) and 2.7x (convolution) faster than the vendor-provided library, and outperforms expert-level hand-tuned code by 1.4x (GEMM), 1.1x (convolution), and 1.3x (fine-grained linear algebra). Additionally, we demonstrate that optimization schedules generated from Autocomp can be reused across similar tensor operations, improving speedups by up to 24% under a fixed sample budget.
Problem

Research questions and friction points this paper is trying to address.

Optimizing tensor accelerator code using LLMs
Challenges in low-resource code generation for accelerators
Improving performance via automated LLM-driven search
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-phase structured prompt for optimization
Domain knowledge via optimization menu
Hardware feedback integration for search
🔎 Similar Papers
No similar papers found.