Systolic Array-based Accelerator for State-Space Models

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
State-space models (SSMs) suffer from high computational and memory overhead on CPUs/GPUs due to the need to numerically solve differential equations for continuous-time integration, hindering efficient long-sequence modeling. This work proposes EpochCore, a domain-specific systolic-array accelerator for SSMs, featuring the novel LIMA-PE—multi-functional processing elements—and ProDF—a unified dataflow architecture—jointly optimized for both conventional DNNs and SSMs. Crucially, EpochCore is the first hardware design to directly accelerate continuous-time integration, enabling end-to-end SSM acceleration. Evaluations show that EpochCore achieves 250× higher throughput and 45× better energy efficiency than a general-purpose systolic array. On the Long Range Arena (LRA) benchmark, it reduces inference latency by 2000× compared to GPU execution, substantially overcoming key hardware acceleration bottlenecks for SSMs.

Technology Category

Application Category

📝 Abstract
Sequence modeling is crucial for AI to understand temporal data and detect complex time-dependent patterns. While recurrent neural networks (RNNs), convolutional neural networks (CNNs), and Transformers have advanced in capturing long-range dependencies, they struggle with achieving high accuracy with very long sequences due to limited memory retention (fixed context window). State-Space Models (SSMs) leverage exponentially decaying memory enabling lengthy context window and so they process very long data sequences more efficiently than recurrent and Transformer-based models. Unlike traditional neural models like CNNs and RNNs, SSM-based models require solving differential equations through continuous integration, making training and inference both compute- and memory-intensive on conventional CPUs and GPUs. In this paper we introduce a specialized hardware accelerator, EpochCore, for accelerating SSMs. EpochCore is based on systolic arrays (SAs) and is designed to enhance the energy efficiency and throughput of inference of SSM-based models for long-range sequence tasks. Within the SA, we propose a versatile processing element (PE) called LIMA-PE to perform traditional and specialized MAC operations to support traditional DNNs and SSMs. To complement the EpochCore microarchitecture, we propose a novel dataflow, ProDF, which enables highly efficient execution of SSM-based models. By leveraging the LIMA-PE microarchitecture and ProDF, EpochCore achieves on average 250x gains in performance and 45x improvement in energy efficiency, at the expense of 2x increase in area cost over traditional SA-based accelerators, and around ~2,000x improvement in latency/inference on LRA datasets compared to GPU kernel operations.
Problem

Research questions and friction points this paper is trying to address.

Accelerating State-Space Models for long sequences efficiently
Reducing compute and memory intensity of SSM differential equations
Improving energy efficiency and throughput in SSM inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systolic array-based accelerator for SSMs
Versatile LIMA-PE for DNNs and SSMs
ProDF dataflow enhances SSM efficiency
🔎 Similar Papers
No similar papers found.
S
Shiva Raja
ECE Dept, Boston University
C
Cansu Demirkiran
ECE Dept, Boston University
A
Aakash Sarkar
Psychological and Brain Sciences, Boston University
M
Milos Popovic
ECE Dept, Boston University
Ajay Joshi
Ajay Joshi
Professor, ECE Department, Boston University
Computer ArchitectureSecurity and PrivacyVLSI DesignSilicon Photonics