🤖 AI Summary
This work proposes a 3D-integrated SRAM-eDRAM hybrid compute-in-memory (CIM) architecture that directly supports general-purpose matrix operations—including transposition, element-wise addition, and multiplication—at 4-bit precision within the memory crossbar array. While existing CIM architectures are primarily limited to dot-product computations, the proposed design extends CIM capabilities to versatile matrix processing through a transpose-optimized structure, in-memory arithmetic units, peripheral sensing circuits, and 3D stacking technology. Implemented in GlobalFoundries 22nm FDSOI technology, the architecture maintains compatibility with conventional dot-product CIM while significantly improving energy efficiency, computational density, and latency. This advancement offers a more flexible and efficient hardware foundation for AI and high-performance computing applications.
📝 Abstract
With the rapid growth of deep neural networks (DNNs), compute-in-memory (CIM) has emerged as a promising energy-efficient paradigm for accelerating multiply-and-accumulate (MAC) operations. Yet, current CIM architectures are largely limited to dot-product computations and struggle to efficiently support general-purpose matrix operations, such as transpose, element-wise addition, and multiplication. This work presents a 3D-integrated, memory-on-memory SRAM-eDRAM hybrid CIM architecture, implemented in GlobalFoundries 22~nm FDSOI technology, capable of performing general matrix operations directly within the memory crossbar with 4-bit precision. By leveraging a specialized transpose-based architecture, in-memory arithmetic operations, peripheral-aware design, and 3D SRAM--eDRAM integration, the proposed architecture balances latency, energy efficiency, and compute density for general purpose matrix operations while remaining compatible with the conventional CIM dot product architectures. Overall, this memory-on-memory CIM framework generalizes CIM beyond dot products, enabling versatile matrix processing and paving the way for broader applications in AI acceleration and general-purpose high performance computing.