🤖 AI Summary
Index reversal permutations in FFT-accelerated convolution cause inefficient memory access and reduced arithmetic intensity. Method: This paper proposes a permutation-avoidance technique for fixed-filter scenarios, consolidating multiple runtime index reordering operations—traditionally performed separately in forward and inverse FFTs—into a single offline pre-reordering of the filter coefficients. It is the first approach to eliminate multidimensional permutations within the general-radix Cooley–Tukey FFT framework, thereby completely removing permutation overhead from the runtime FFT path. The method integrates optimized butterfly computations with structured filter pre-permutation to enhance memory locality and arithmetic intensity. Contribution/Results: Benchmark evaluations demonstrate superior performance over state-of-the-art FFT-based convolution libraries (e.g., FFTW, cuFFT), establishing a novel kernel design paradigm for FFT-accelerated convolution.
📝 Abstract
Fast Fourier Transform (FFT) libraries are widely used for evaluating discrete convolutions. Most FFT implementations follow some variant of the Cooley-Tukey framework, in which the transform is decomposed into butterfly operations and index-reversal permutations. While butterfly operations dominate the floating-point operation count, the memory access patterns induced by index-reversal permutations significantly degrade the FFT's arithmetic intensity. In practice, discrete convolutions are often applied repeatedly with a fixed filter. In such cases, we show that the index-reversal permutations involved in both the forward and backward transforms of standard FFT-based convolution implementations can be avoided by deferring to a single offline permutation of the filter. We propose a multi-dimensional, permutation-avoiding convolution procedure within a general radix Cooley-Tukey framework. We perform numerical experiments to benchmark our algorithms against state-of-the-art FFT-based convolution implementations. Our results suggest that developers of FFT libraries should consider supporting permutation-avoiding convolution kernels.