Heterogeneous Mapping for Analog In-Memory Computing Accelerators: A Unified Workflow

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the challenge of efficiently deploying decoder-only Transformer models such as GPT-2 on analog in-memory computing (AIMC) systems, which suffer from limited effective precision. The authors propose a four-stage heterogeneous mapping workflow that, for the first time, constructs a precision sensitivity profile for such models. This analysis reveals that overall accuracy degradation is dominated by only four critical projection layers, with the attention output of the first decoder block exhibiting an order-of-magnitude greater impact than others. Leveraging these insights, the study introduces a projection-level mapping strategy coupled with selective digital execution, integrating precision sensitivity analysis and hierarchical computation partitioning to enable efficient and reliable deployment while preserving model performance.

📝 Abstract

Analog In-Memory Computing (AIMC) accelerators execute matrix-vector multiplications directly within memory arrays, reducing data movement and improving DNN inference efficiency. Their limited effective precision motivates heterogeneous architectures that combine analog compute tiles with digital processing units. This letter classifies existing methods for partitioning DNN workloads across these resources by mapping granularity, optimization strategy, and model support, and distills them into a unified four-stage workflow. To demonstrate the workflow on a model class not yet addressed by existing methods, we apply its first two stages to GPT-2, producing the first AIMC-specific precision sensitivity profile for a decoder-only transformer. Sensitivity is dominated by 4 of 49 projections, with the first decoder block's attention output dominating by an order of magnitude. This suggests that projection-level mapping and selective digital execution of early-block and output-facing projections are important for reliable decoder-transformer deployment on AIMC hardware.

Problem

Research questions and friction points this paper is trying to address.

Analog In-Memory Computing

Heterogeneous Mapping

Precision Sensitivity

Decoder Transformer

Workload Partitioning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analog In-Memory Computing

Heterogeneous Mapping

Precision Sensitivity