Optimizing GEMM for Energy and Performance on Versal ACAP Architectures

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address performance and energy-efficiency bottlenecks of GEMM on Versal ACAP’s heterogeneous architecture in edge computing, this paper proposes an automated optimization framework integrating machine learning–based modeling and hardware-aware mapping. For the first time, it trains predictive models on nearly 6,000 board-level empirical measurements to guide design-space exploration across AIE, PL, and PS components—overcoming limitations of conventional analytical models. The framework enables coordinated scheduling of heterogeneous compute resources, achieving efficient GEMM parallelization and dynamic resource allocation. Evaluation on the VCK190 platform demonstrates geometric mean throughput improvement of 1.23× (up to 2.5×) and energy efficiency gain of 1.25× (up to 2.7×), outperforming state-of-the-art approaches. Its core contribution is the first ML-driven, ACAP-specific GEMM mapping framework that systematically co-optimizes performance and energy efficiency.

Technology Category

Application Category

📝 Abstract
General Matrix Multiplication (GEMM) is a fundamental operation in many scientific workloads, signal processing, and particularly deep learning. It is often a bottleneck for performance and energy efficiency, especially in edge environments with tight resource and power constraints. AMD's Versal ACAP offers heterogeneous components (AIEs, PL, PS) that can address these challenges, but mapping GEMM across them is complex, with prior works largely overlooking energy-performance trade-offs. In this paper, we propose an automated framework for Versal ACAP that generates GEMM mappings optimized for either performance or energy efficiency. Unlike prior analytical approaches, our method leverages a Machine Learning (ML) model, trained on approximately 6000 on-board experiments of different GEMM mappings, to guide Design Space Exploration, yielding more efficient designs. Evaluation on the Versal VCK190 shows geomean improvements of 1.23x (up to 2.5x) in throughput and 1.25x (up to 2.7x) in energy efficiency over state-of-the-art frameworks.
Problem

Research questions and friction points this paper is trying to address.

Optimizing GEMM for energy and performance on Versal ACAP
Addressing performance-energy trade-offs in edge environments
Automating efficient GEMM mapping using machine learning models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated framework for Versal ACAP GEMM optimization
Machine Learning model guides design space exploration
Trained on 6000 on-board experiments for efficiency
🔎 Similar Papers
No similar papers found.