🤖 AI Summary
Existing long-vector ISAs struggle to fully exploit ultra-wide SIMD resources in in-memory computing, particularly under multi-directional parallel workloads, due to inherent limitations of one-dimensional memory access and execution models—resulting in suboptimal performance and energy efficiency. This paper targets mobile in-cache computing and proposes the Multidimensional Vector Extension (MVE): the first ISA extension to directly expose multidimensional data parallelism at the instruction level, supporting multidimensional strided and random memory access, dimension-wise masking, and cache-geometry-aware abstraction to maximize SIMD utilization. MVE is designed for compatibility with both RISC-V and Arm architectures. Evaluation on mobile data-parallel workloads demonstrates an average 2.9× speedup and 8.8× energy-efficiency improvement, with only a 3.6% area overhead.
📝 Abstract
In-cache computing technology transforms existing caches into long-vector compute units and offers low-cost alternatives to building expensive vector engines for mobile CPUs. Unfortunately, existing long-vector Instruction Set Architecture (ISA) extensions, such as RISC-V Vector Extension (RVV) and Arm Scalable Vector Extension (SVE), provide only one-dimensional strided and random memory accesses. While this is sufficient for typical vector engines, it fails to effectively utilize the large Single Instruction, Multiple Data (SIMD) widths of in-cache vector engines. This is because mobile data-parallel kernels expose limited parallelism across a single dimension. Based on our analysis of mobile vector kernels, we introduce a long-vector Multi-dimensional Vector ISA Extension (MVE) for mobile in-cache computing. MVE achieves high SIMD resource utilization and enables flexible programming by abstracting cache geometry and data layout. The proposed ISA features multi-dimensional strided and random memory accesses and efficient dimension-level masked execution to encode parallelism across multiple dimensions. Using a wide range of data-parallel mobile workloads, we demonstrate that MVE offers significant performance and energy reduction benefits of 2.9x and 8.8x, on average, compared to the SIMD units of a commercial mobile processor, at an area overhead of 3.6%.