🤖 AI Summary
RISC-V’s adoption in high-performance computing (HPC) remains limited due to a lack of optimized domain-specific workloads. Method: This paper presents the first implementation and deep optimization of 2D Fast Fourier Transform (FFT) on the Tenstorrent Wormhole n300 PCIe accelerator—built upon the RISC-V-based Tensix architecture—leveraging its decoupled compute and data movement hardware to address memory bandwidth bottlenecks via customized optimizations, including memory access reordering and tiling. Numerical accuracy is strictly preserved. Contribution/Results: Experiments show that Wormhole n300 achieves ~8× lower power consumption and 2.8× lower energy consumption than a 24-core Intel Xeon Platinum CPU for 2D FFT. This work demonstrates the practical viability of RISC-V-based accelerators in energy-constrained HPC scenarios and establishes a reusable algorithm-architecture co-design methodology for open instruction set architectures in scientific computing.
📝 Abstract
Whilst numerous areas of computing have adopted the RISC-V Instruction Set Architecture (ISA) wholesale in recent years, it is yet to become widespread in HPC. RISC-V accelerators offer a compelling option where the HPC community can benefit from the specialisation offered by the open nature of the standard but without the extensive ecosystem changes required when adopting RISC-V CPUs. In this paper we explore porting the Cooley-Tukey Fast Fourier Transform (FFT) algorithm to the Tenstorrent Wormhole PCIe RISC-V based accelerator. Built upon Tenstorrent's Tensix architecture, this technology decouples the movement of data from compute, potentially offering increased control to the programmer. Exploring different optimisation techniques to address the bottlenecks inherent in data movement, we demonstrate that for a 2D FFT whilst the Wormhole n300 is slower than a server-grade 24-core Xeon Platinum CPU, the Wormhole draws around 8 times less power and consumes around 2.8 times less energy than the CPU when computing the Fourier transform.