Permutation-Avoiding FFT-Based Convolution

📅 2025-06-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Index reversal permutations in FFT-accelerated convolution cause inefficient memory access and reduced arithmetic intensity. Method: This paper proposes a permutation-avoidance technique for fixed-filter scenarios, consolidating multiple runtime index reordering operations—traditionally performed separately in forward and inverse FFTs—into a single offline pre-reordering of the filter coefficients. It is the first approach to eliminate multidimensional permutations within the general-radix Cooley–Tukey FFT framework, thereby completely removing permutation overhead from the runtime FFT path. The method integrates optimized butterfly computations with structured filter pre-permutation to enhance memory locality and arithmetic intensity. Contribution/Results: Benchmark evaluations demonstrate superior performance over state-of-the-art FFT-based convolution libraries (e.g., FFTW, cuFFT), establishing a novel kernel design paradigm for FFT-accelerated convolution.

Technology Category

Application Category

📝 Abstract
Fast Fourier Transform (FFT) libraries are widely used for evaluating discrete convolutions. Most FFT implementations follow some variant of the Cooley-Tukey framework, in which the transform is decomposed into butterfly operations and index-reversal permutations. While butterfly operations dominate the floating-point operation count, the memory access patterns induced by index-reversal permutations significantly degrade the FFT's arithmetic intensity. In practice, discrete convolutions are often applied repeatedly with a fixed filter. In such cases, we show that the index-reversal permutations involved in both the forward and backward transforms of standard FFT-based convolution implementations can be avoided by deferring to a single offline permutation of the filter. We propose a multi-dimensional, permutation-avoiding convolution procedure within a general radix Cooley-Tukey framework. We perform numerical experiments to benchmark our algorithms against state-of-the-art FFT-based convolution implementations. Our results suggest that developers of FFT libraries should consider supporting permutation-avoiding convolution kernels.
Problem

Research questions and friction points this paper is trying to address.

Reducing memory access overhead in FFT-based convolutions
Eliminating index-reversal permutations for fixed-filter applications
Improving arithmetic intensity in multi-dimensional convolution algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Avoids index-reversal permutations in FFT
Uses offline filter permutation once
Multi-dimensional radix Cooley-Tukey framework
🔎 Similar Papers
No similar papers found.
N
Nicolas Venkovic
Computational Mathematics, School of Computation, Information and Technology, Technical University of Munich, Germany
N
Nicolas Venkovic
Computational Mathematics, School of Computation, Information and Technology, Technical University of Munich, Germany
Hartwig Anzt
Hartwig Anzt
TU Munich / University of Tennessee
HPCNumerical Linear AlgebraGPUParallel ComputingSustainable Software Development