scholarly journals A Depth-First Iterative Algorithm for the Conjugate Pair Fast Fourier Transform

Author(s):  
Alexandre Becoulet ◽  
Amandine Verguet

The Split-Radix Fast Fourier Transform has the same low arithmetic complexity as the related Conjugate Pair Fast Fourier Transform. Both transforms have an irregular datapath structure which is straightforwardly expressed only in recursive forms. Furthermore, the conjugate pair variant has a complicated input indexing pattern which requires existing iterative implementations to rely on precomputed tables. It however allows optimization of the memory bandwidth as it requires a single twiddle factor load per radix-4 butterfly. In existing algorithms, this comes at the cost of using additional precomputed tables or performing recursive function calls. In this paper we present two novel approaches that handle both the butterfly scheduling and the input index generation of the Conjugate Pair Fast Fourier Transform. The proposed algorithm is cache-friendly because it is depth-first, non-recursive and does not rely on precomputed index tables. In order to achieve this, we relate the butterfly execution pattern of the Split-Radix and Conjugate Pair FFTs to the binary carry sequence. Based on this finding, we describe how common integer arithmetic and bitwise operations can be used to perform input reordering and depth-first traversal of the transform datapath with O(1) space complexity.<br>

2020 ◽  
Author(s):  
Alexandre Becoulet ◽  
Amandine Verguet

The Split-Radix Fast Fourier Transform has the same low arithmetic complexity as the related Conjugate Pair Fast Fourier Transform. Both transforms have an irregular datapath structure which is straightforwardly expressed only in recursive forms. Furthermore, the conjugate pair variant has a complicated input indexing pattern which requires existing iterative implementations to rely on precomputed tables. It however allows optimization of the memory bandwidth as it requires a single twiddle factor load per radix-4 butterfly. In existing algorithms, this comes at the cost of using additional precomputed tables or performing recursive function calls. In this paper we present two novel approaches that handle both the butterfly scheduling and the input index generation of the Conjugate Pair Fast Fourier Transform. The proposed algorithm is cache-friendly because it is depth-first, non-recursive and does not rely on precomputed index tables. In order to achieve this, we relate the butterfly execution pattern of the Split-Radix and Conjugate Pair FFTs to the binary carry sequence. Based on this finding, we describe how common integer arithmetic and bitwise operations can be used to perform input reordering and depth-first traversal of the transform datapath with O(1) space complexity.<br>


1989 ◽  
Vol 25 (22) ◽  
pp. 1547
Author(s):  
I. Kamar ◽  
Y. Elcherif

1989 ◽  
Vol 25 (5) ◽  
pp. 324 ◽  
Author(s):  
I. Kamar ◽  
Y. Elcherif

Sensors ◽  
2019 ◽  
Vol 19 (18) ◽  
pp. 4037
Author(s):  
Shania Stewart ◽  
Ha H. Nguyen ◽  
Robert Barton ◽  
Jerome Henry

This paper presents two methods to optimize LoRa (Low-Power Long-Range) devices so that implementing multiplier-less pulse shaping filters is more economical. Basic chirp waveforms can be generated more efficiently using the method of chirp segmentation so that only a quarter of the samples needs to be stored in the ROM. Quantization can also be applied to the basic chirp samples in order to reduce the number of unique input values to the filter, which in turn reduces the size of the lookup table for multiplier-less filter implementation. Various tests were performed on a simulated LoRa system in order to evaluate the impact of the quantization error on the system performance. By examining the occupied bandwidth, fast Fourier transform used for symbol demodulation, and bit-error rates, it is shown that even performing a high level of quantization does not cause significant performance degradation. Therefore, the memory requirements of LoRa devices can be significantly reduced by using the methods of chirp segmentation and quantization so as to improve the feasibility of implementing multiplier-less filters in LoRa devices.


1992 ◽  
Vol 28 (12) ◽  
pp. 1143 ◽  
Author(s):  
A.M. Krot ◽  
H.B. Minervina

1990 ◽  
Vol 26 (8) ◽  
pp. 541 ◽  
Author(s):  
H.-S. Qian ◽  
Z.-J. Zhao

1996 ◽  
Vol 39 (4) ◽  
Author(s):  
Y. Tulunay ◽  
S. A. Baykal ◽  
Y. G. Yigit ◽  
I. Stanislawska ◽  
A. Rokicki ◽  
...  

During the COST 238: PRIME project there was a campaign of 15-min intervals of the f0F2 soundings at Kandilli, Rome, Sofia, Poitiers and Lannion. The campaign took place for one month in June 1993. The spectral analysis of the data using a Fast Fourier Transform (FFT) algorithm proved a relatively easy method to reconstruct the original data at the confidence level = 0.05.


2006 ◽  
Vol 16 (02) ◽  
pp. 153-164
Author(s):  
RAMI AL NA'MNEH ◽  
W. DAVID PAN ◽  
SEONG-MOO YOO

Computing the 1-D Fast Fourier Transform (FFT) using the conventional six-step FFT algorithm on parallel computers requires intensive all-to-all communication due to the necessity of matrix transpose in three steps. This all-to-all communication is a limiting factor in improving the performance of FFT in its parallel implementations. In this paper, we present two parallel algorithms for implementing the 1-D FFT without all-to-all communication between processors, at the expense of increased inner-processor computation as compared to the conventional six-step FFT algorithm. Our analysis reveals the advantage of these two algorithms over the six-step FFT algorithm in parallel systems where the cost of inter-processor communication outweighs the cost of inner-processor computation. As a case study, we choose a 32-node Beowulf cluster with fast processors (running at 2 GHz) but relatively slow inter-processor communication (over a 100 Mbit/s switch). Simulation results on this cluster demonstrate that the proposed no-communication FFT algorithms can achieve a speedup ranging from 1.1 to 1.5 over the six-step FFT algorithm.


2016 ◽  
Vol 2016 ◽  
pp. 1-7
Author(s):  
Anatolij Sergiyenko ◽  
Anastasia Serhienko

A set of soft IP cores for the Winogradr-point fast Fourier transform (FFT) is considered. The cores are designed by the method of spatial SDF mapping into the hardware, which provides the minimized hardware volume at the cost of slowdown of the algorithm byrtimes. Their clock frequency is equal to the data sampling frequency. The cores are intended for the high-speed pipelined FFT processors, which are implemented in FPGA.


Sign in / Sign up

Export Citation Format

Share Document