home | alphabetical index | |||||||

## Cooley-Tukey FFT algorithmTheCooley-Tukey algorithm is the most common fast Fourier transform (FFT) algorithm. It re-expresses the discrete Fourier transform (DFT) of an arbitrary composite size n = n_{1}n_{2} in terms of smaller DFTs of sizes n_{1} and n_{2}, recursively, in order to reduce the computation time to O(n log n) for highly-composite n. Because of the algorithm's importance, specific variants and implementation styles have become known by their own names, as described below.This algorithm, including its recursive application, was already known around 1805 to Carl Friedrich Gauss, who used it to interpolate the trajectories of the asteroids Pallas and Juno, but his work was not widely recognized (being published only posthumously and in neo-Latin); Gauss did not analyze the asymptotic computational time, however. Various limited forms were also rediscovered several times throughout the 19th and early 20th centuries. FFTs became popular after J. W. Cooley of IBM and J. W. Tukey of Princeton published a paper in 1965 reinventing the algorithm and describing how to perform it conveniently on a computer (including how to arrange for the output to be produced in the natural ordering). Because the Cooley-Tukey algorithm breaks the DFT into smaller DFTs, it can be combined arbitrarily with any other algorithm for the DFT. For example, Rader's or Bluestein's algorithm can be used to handle large prime factors that cannot be decomposed by Cooley-Tukey, or the Prime-factor algorithm can be exploited for greater efficiency in separating out relatively prime factors. See also the fast Fourier transform for information on other FFT algorithms, specializations for real and/or symmetric data, and accuracy in the face of finite floating-point precision.
## The radix-2 DIT caseRecall that the DFT is defined by the formula: Radix-2 DIT first computes the Fourier transforms of the even-indexed numbersx_{0}, x_{2}, ..., x_{n-2}, and of the odd-indexed numbers x_{1}, x_{3}, ..., x_{n-1}, and then combines those two results to produce the Fourier transform of the whole sequence. This idea can then be performed recursively to reduce the overall runtime to O(n log n). This simplified form assumes that n is a power of two; since the number of sample points n can usually be chosen freely by the application, this is often not an important restriction.
More explicitly, let us write
The above re-expression of a size- (140 minutes for size 64 may sound like a long time, but it corresponds to an average of at most 16 seconds per floating-point operation, around 20% of which are multiplications...this is a fairly impressive rate for a human being to sustain for almost two and a half hours, especially when you consider the bookkeeping overhead.) ## General factorizations- Perform
*n*_{1}DFTs of size*n*_{2}. - Multiply by complex roots of unity called
**twiddle factors**. - Perform
*n*_{2}DFTs of size*n*_{1}.
n_{1} or n_{2} is a small factor (not necessarily prime), called the radix (which can differ between stages of the recursion). If n_{1} is the radix, it is called a decimation in time (DIT) algorithm, whereas if n_{2} is the radix, it is decimation in frequency (DIF, also called the Sande-Tukey algorithm). The version presented above was a radix-2 DIT algorithm; in the final expression, the phase multiplying the odd transform is the twiddle factor, and the +/- combination (butterfly) of the even and odd transforms is a size-2 DFT. (The radix's small DFT is sometimes known as a butterfly, so-called because of the shape of the data-flow diagram for the radix-2 case.)
There are many other variations on the Cooley-Tukey algorithm.
The general Cooley-Tukey factorization rewrites the indices
n_{2}, the outer sum is a DFT of size n_{1}, and the [...] bracketed term is the twiddle factor.
The 1965 Cooley-Tukey paper noted that one can employ an arbitrary radix ## Data reordering, bit reversal, and in-place algorithmsAlthough the abstract Cooley-Tukey factorization of the DFT, above, applies to all implementations of the algorithm, much greater diversity exists in the techniques for ordering and accessing the data at each stage of the FFT. Of special interest is the problem of devising an in-place algorithm that overwrites its input with its output data using only O(1) auxiliary storage.
The most well-known reordering technique involves explicit f'' are combined with a size-2 DFT, those two values are overwritten by the outputs. However, the two output values should go in the first and second _{j}halves of the output array, corresponding to the most significant bit b_{4} (for n=32); whereas the two inputs f' and _{j}f'' are interleaved in the even and odd elements, corresponding to the _{j}least significant bit b_{0}. Thus, in order to get the output in the correct place, these two bits must be swapped in the input. If you include all of the recursive stages of a radix-2 DIT algorithm, all the bits must be swapped and thus one must pre-process the input with a bit reversal to get in-order output. Correspondingly, the reversed (dual) algorithm is radix-2 DIF, and this takes in-order input and produces bit-reversed output, requiring a bit-reversal post-processing step. Alternatively, some applications (such as convolution) work equally well on bit-reversed data, so one can do radix-2 DIF without bit reversal, followed by processing, followed by the radix-2 DIT inverse DFT without bit reversal to produce final results in the natural order.
Many FFT users, however, prefer natural-order outputs, and a separate, explicit bit-reversal stage can have a non-negligible impact on the computation time, even though bit reversal can be done in O(
The problem is greatly simplified if it is
A typical strategy for in-place algorithms without auxiliary storage and without separate digit-reversal passes involves small matrix transpositions (which swap individual pairs of digits) at intermediate stages, which can be combined with the radix butterflies to reduce the number of passes over the data (Johnson & Burrus, 1984; Temperton, 1991; Qian
## References- James W. Cooley and John W. Tukey, "An algorithm for the machine calculation of complex Fourier series,"
*Math. Comput.***19**, 297–301 (1965). - Carl Friedrich Gauss, "Nachlass: Theoria interpolationis methodo nova tractata,"
*Werke*band**3**, 265–327 (Königliche Gesellschaft der Wissenschaften, Göttingen, 1866). See also M. T. Heideman, D. H. Johnson, and C. S. Burrus, "Gauss and the history of the fast Fourier transform,"*IEEE ASSP Magazine***1**(4), 14–21 (1984). - G. C. Danielson and C. Lanczos, "Some improvements in practical Fourier analysis and their application to X-ray scattering from liquids,"
*J. Franklin Inst.***233**, 365–380 and 435–452 (1942). - W. M. Gentleman and G. Sande, "Fast Fourier transforms—for fun and profit,"
*Proc. AFIPS***29**, 563–578 (1966). - P. Duhamel and M. Vetterli, "Fast Fourier transforms: a tutorial review and a state of the art,"
*Signal Processing***19**, 259–299 (1990). - David H. Bailey, "FFTs in external or hierarchical memory,"
*J. Supercomputing***4**(1), 23–35 (1990). - Alan H. Karp, "Bit reversal on uniprocessors,"
*SIAM Review***38**(1), 1–26 (1996). - Larry Carter and Kang Su Gatlin, "Towards an optimal bit-reversal permutation program,"
*Proc. 39th Ann. Symp. on Found. of Comp. Sci. (FOCS)*, 544–553 (1998). - M. Rubio, P. Gómez, and K. Drouiche, "A new superfast bit reversal algorithm,"
*Intl. J. Adaptive Control and Signal Processing***16**, 703–707 (2002). - T. G. Stockham, "High speed convolution and correlation",
*Spring Joint Computer Conference, Proc. AFIPS***28**, 229–233 (1966). - P. N. Swarztrauber, "Vectorizing the FFTs", in G. Rodrigue (Ed.),
*Parallel Computations*(Academic Press, New York, 1982), pp. 51–83. - M. C. Pease, "An adaptation of the fast Fourier transform for parallel processing",
*J. ACM***15**(2), 252–264 (1968). - Richard C. Singleton, "A method for computing the fast Fourier transform",
*Commun. of the ACM***10**(1967), 647–654. - H. W. Johnson and C. S. Burrus, "An in-place in-order radix-2 FFT,"
*Proc. ICASSP*, 28A.2.1–28A.2.4 (1984). - C. Temperton, "Self-sorting in-place fast Fourier transform,"
*SIAM J. Sci. Stat. Comput.***12**(4), 808–823 (1991). - Z. Qian, C. Lu, M. An, and R. Tolimieri, "Self-sorting in-place FFT algorithm with minimum working space,"
*IEEE Trans. ASSP***52**(10), 2835–2836 (1994). - M. Hegland, "A self-sorting in-place fast Fourier transform algorithm suitable for vector and parallel processing,"
*Numerische Mathematik***68**(4), 507–547 (1994). - Matteo Frigo and Steven G. Johnson:
*FFTW*, http://www.fftw.org/. A free (GPL) C library for computing discrete Fourier transforms in one or more dimensions, of arbitrary size, using the Cooley-Tukey algorithm. Also M. Frigo and S. G. Johnson, "FFTW: An adaptive software architecture for the FFT,"*Proc. ICASSP***3**, 1381–1384 (1998).
| |||||||

copyright © 2004 FactsAbout.com |