Exploring Fast Fourier Transforms on the Tenstorrent Wormhole

📅 2025-06-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
RISC-V’s adoption in high-performance computing (HPC) remains limited due to a lack of optimized domain-specific workloads. Method: This paper presents the first implementation and deep optimization of 2D Fast Fourier Transform (FFT) on the Tenstorrent Wormhole n300 PCIe accelerator—built upon the RISC-V-based Tensix architecture—leveraging its decoupled compute and data movement hardware to address memory bandwidth bottlenecks via customized optimizations, including memory access reordering and tiling. Numerical accuracy is strictly preserved. Contribution/Results: Experiments show that Wormhole n300 achieves ~8× lower power consumption and 2.8× lower energy consumption than a 24-core Intel Xeon Platinum CPU for 2D FFT. This work demonstrates the practical viability of RISC-V-based accelerators in energy-constrained HPC scenarios and establishes a reusable algorithm-architecture co-design methodology for open instruction set architectures in scientific computing.

Technology Category

Application Category

📝 Abstract
Whilst numerous areas of computing have adopted the RISC-V Instruction Set Architecture (ISA) wholesale in recent years, it is yet to become widespread in HPC. RISC-V accelerators offer a compelling option where the HPC community can benefit from the specialisation offered by the open nature of the standard but without the extensive ecosystem changes required when adopting RISC-V CPUs. In this paper we explore porting the Cooley-Tukey Fast Fourier Transform (FFT) algorithm to the Tenstorrent Wormhole PCIe RISC-V based accelerator. Built upon Tenstorrent's Tensix architecture, this technology decouples the movement of data from compute, potentially offering increased control to the programmer. Exploring different optimisation techniques to address the bottlenecks inherent in data movement, we demonstrate that for a 2D FFT whilst the Wormhole n300 is slower than a server-grade 24-core Xeon Platinum CPU, the Wormhole draws around 8 times less power and consumes around 2.8 times less energy than the CPU when computing the Fourier transform.
Problem

Research questions and friction points this paper is trying to address.

Porting FFT algorithm to RISC-V accelerator
Optimizing data movement in HPC accelerators
Comparing energy efficiency of RISC-V vs Xeon CPU
Innovation

Methods, ideas, or system contributions that make the work stand out.

RISC-V accelerator for HPC specialization
Cooley-Tukey FFT on Tenstorrent Wormhole
Decoupled data movement and compute
🔎 Similar Papers
No similar papers found.
Nick Brown
Nick Brown
Senior Research Fellow, EPCC at the University of Edinburgh
HPCFPGAsRISC-Vcompilersnovel architectures
J
Jake Davies
EPCC, Bayes Centre, 47 Potterrow, Edinburgh, UK
F
Felix LeClair
Tenstorrent, 2600 Great America Way, Santa Clara, California, USA