🤖 AI Summary
To address key challenges in brain-inspired computing—including difficulty in multimodal zero-shot learning, weak hardware-algorithm co-design, and low energy efficiency—this paper proposes a resistive random-access memory (ReRAM)-based in-memory computing zero-shot liquid state machine (LSM). Through hardware-software co-design, a fixed random LSM encoder and a trainable artificial neural network (ANN) projection layer are physically integrated on a 40-nm in-memory computing macro. The architecture enables unlabeled cross-modal alignment and joint learning of spiking events from diverse modalities, including vision (N-MNIST), audition (N-TIDIGITS), and neural signals. Experiments demonstrate accuracy matching state-of-the-art software implementations, with training cost reduced by 152–393× and energy efficiency improved by 23–160×. This work presents the first experimental validation that compact neuromorphic hardware can effectively support zero-shot multimodal learning.
📝 Abstract
The human brain is a complex spiking neural network (SNN), capable of learning multimodal signals in a zero-shot manner by generalizing existing knowledge. Remarkably, it maintains minimal power consumption through event-based signal propagation. However, replicating the human brain in neuromorphic hardware presents both hardware and software challenges. Hardware limitations, such as the slowdown of Moore's law and Von Neumann bottleneck, hinder the efficiency of digital computers. Additionally, SNNs are characterized by their software training complexities. To this end, we propose a hardware-software co-design on a 40 nm 256 Kb in-memory computing macro that physically integrates a fixed and random liquid state machine (LSM) SNN encoder with trainable artificial neural network (ANN) projections. We showcase the zero-shot LSM-based learning of multimodal events on the N-MNIST and N-TIDIGITS datasets, including visual and audio data association, as well as neural and visual data alignment for brain-machine interfaces. Our co-design achieves classification accuracy comparable to fully optimized software models, resulting in a 152.83 and 393.07-fold reduction in training costs compared to SOTA contrastive language-image pre-training (CLIP) and Prototypical networks, and a 23.34 and 160-fold improvement in energy efficiency compared to cutting-edge digital hardware, respectively. These proof-of-principle prototypes demonstrate zero-shot multimodal events learning capability for emerging efficient and compact neuromorphic hardware.