Spiking and Event-driven Neuromorphic Mamba Models for Efficient Speech Recognition

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses the high computational cost and energy consumption of deep neural networks in automatic speech recognition (ASR), which hinder deployment on resource-constrained edge devices. It presents the first systematic exploration of neuromorphic computing for ASR, introducing event-driven and spiking neural network (SNN) variants of the SpeechMamba model. To reduce computational load, the authors propose the FATReLU activation function combined with activation sparsification. A cycle-accurate, event-driven simulator is also developed to enable algorithm-hardware co-optimization. Experimental results demonstrate that the event-driven model achieves over 60% activation sparsity with less than 1% accuracy degradation, while the SNN variant attains more than 70% sparsity and a 30% reduction in parameter count. The proposed simulator further yields over 10% improvement in energy efficiency.

📝 Abstract

Deep learning has greatly advanced automatic speech recognition (ASR), enabling widespread deployment on edge devices such as smartphones and smart home systems. However, the computational and energy demands of deep neural networks pose significant challenges for such resource-constrained deployments, introducing latency and limiting real-time interaction. Neuromorphic computing offers a promising solution by introducing activation sparsity through spiking neural networks (SNNs) and event-driven neural networks, converting dense operations into sparse computations. However, a study that evaluates the hardware benefits of different neuromorphic strategies remains lacking for ASR. This paper explores spiking and event-driven neuromorphic neural networks to improve activation sparsity in the state-of-the-art SpeechMamba model for ASR. We introduce an event-driven SpeechMamba with FATReLU activation, achieving over 60% activation sparsity with less than 1% accuracy degradation on LibriSpeech. We also propose a spiking SpeechMamba that attains over 70% sparsity while using 30% fewer parameters than comparable SNNs. Finally, we develop a cycle-accurate event-driven simulator enabling flexible algorithm-hardware co-exploration, which helps us identify computational bottlenecks and yields over 10% additional efficiency improvements.

Problem

Research questions and friction points this paper is trying to address.

automatic speech recognition

neuromorphic computing

activation sparsity

edge devices

computational efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spiking Neural Networks

Event-driven Computing

Activation Sparsity