Multidimensional FFTs
As defined in the multidimensional DFT article, the multidimensional DFT
transforms an array with a -dimensional vector of indices by a set of nested summations (over for each ), where the division, defined as, is performed element-wise. Equivalently, it is simply the composition of a sequence of sets of one-dimensional DFTs, performed along one dimension at a time (in any order).
This compositional viewpoint immediately provides the simplest and most common multidimensional DFT algorithm, known as the row-column algorithm (after the two-dimensional case, below). That is, one simply performs a sequence of one-dimensional FFTs (by any of the above algorithms): first you transform along the dimension, then along the dimension, and so on (or actually, any ordering will work). This method is easily shown to have the usual complexity, where is the total number of data points transformed. In particular, there are transforms of size, etcetera, so the complexity of the sequence of FFTs is:
In two dimensions, the can be viewed as an matrix, and this algorithm corresponds to first performing the FFT of all the rows and then of all the columns (or vice versa), hence the name.
In more than two dimensions, it is often advantageous for cache locality to group the dimensions recursively. For example, a three-dimensional FFT might first perform two-dimensional FFTs of each planar "slice" for each fixed, and then perform the one-dimensional FFTs along the direction. More generally, an asymptotically optimal cache-oblivious algorithm consists of recursively dividing the dimensions into two groups and that are transformed recursively (rounding if is not even) (see Frigo and Johnson, 2005). Still, this remains a straightforward variation of the row-column algorithm that ultimately requires only a one-dimensional FFT algorithm as the base case, and still has complexity. Yet another variation is to perform matrix transpositions in between transforming subsequent dimensions, so that the transforms operate on contiguous data; this is especially important for out-of-core and distributed memory situations where accessing non-contiguous data is extremely time-consuming.
There are other multidimensional FFT algorithms that are distinct from the row-column algorithm, although all of them have complexity. Perhaps the simplest non-row-column FFT is the vector-radix FFT algorithm, which is a generalization of the ordinary Cooley–Tukey algorithm where one divides the transform dimensions by a vector of radices at each step. (This may also have cache benefits.) The simplest case of vector-radix is where all of the radices are equal (e.g. vector-radix-2 divides all of the dimensions by two), but this is not necessary. Vector radix with only a single non-unit radix at a time, i.e., is essentially a row-column algorithm. Other, more complicated, methods include polynomial transform algorithms due to Nussbaumer (1977), which view the transform in terms of convolutions and polynomial products. See Duhamel and Vetterli (1990) for more information and references.
Read more about this topic: Fast Fourier Transform