scholarly journals Low-Complexity Hardware Interleaver/Deinterleaver for IEEE 802.11a/g/n WLAN

VLSI Design ◽  
2012 ◽  
Vol 2012 ◽  
pp. 1-7
Author(s):  
Zhen-dong Zhang ◽  
Bin Wu ◽  
Yu-mei Zhou ◽  
Xin Zhang

A high-speed low-complexity hardware interleaver/deinterleaver is presented. It supports all 77 802.11n high-throughput (HT) modulation and coding schemes (MCSs) with short and long guard intervals and the 8 non-HT MCSs defined in 802.11a/g. The paper proposes a design methodology that distributes the three permutations of an interleaver to both write address and read address. The methodology not only reduces the critical path delay but also facilitates the address generation. In addition, the complex mathematical formulas are replaced with optimized hardware structures in which hardware intensive dividers and multipliers are avoided. Using 0.13 um CMOS technology, the cell area of the proposed interleaver/deinterleaver is 0.07 mm2, and the synthesized maximal working frequency is 400 MHz. Comparison results show that it outperforms the three other similar works with respect to hardware complexity and max frequency while maintaining high flexibility.

2015 ◽  
Vol 2015 ◽  
pp. 1-16 ◽  
Author(s):  
Burhan Khurshid ◽  
Roohie Naaz Mir

Generalized parallel counters (GPCs) are used in constructing high speed compressor trees. Prior work has focused on utilizing the fast carry chain and mapping the logic onto Look-Up Tables (LUTs). This mapping is not optimal in the sense that the LUT fabric is not fully utilized. This results in low efficiency GPCs. In this work, we present a heuristic that efficiently maps the GPC logic onto the LUT fabric. We have used our heuristic on various GPCs and have achieved an improvement in efficiency ranging from 33% to 100% in most of the cases. Experimental results using Xilinx 5th-, 6th-, and 7th-generation FPGAs and Stratix IV and V devices from Altera show a considerable reduction in resources utilization and dynamic power dissipation, for almost the same critical path delay. We have also implemented GPC-based FIR filters on 7th-generation Xilinx FPGAs using our proposed heuristic and compared their performance against conventional implementations. Implementations based on our heuristic show improved performance. Comparisons are also made against filters based on integrated DSP blocks and inherent IP cores from Xilinx. The results show that the proposed heuristic provides performance that is comparable to the structures based on these specialized resources.


VLSI technology become one of the most significant and demandable because of the characteristics like device portability, device size, large amount of features, expenditure, consistency, rapidity and many others. Multipliers and Adders place an important role in various digital systems such as computers, process controllers and signal processors in order to achieve high speed and low power. Two input XOR/XNOR gate and 2:1 multiplexer modules are used to design the Hybrid Full adders. The XOR/XNOR gate is the key punter of power included in the Full adder cell. However this circuit increases the delay, area and critical path delay. Hence, the optimum design of the XOR/XNOR is required to reduce the power consumption of the Full adder Cell. So a 6 New Hybrid Full adder circuits are proposed based on the Novel Full-Swing XOR/XNOR gates and a New Gate Diffusion Input (GDI) design of Full adder with high-swing outputs. The speed, power consumption, power delay product and driving capability are the merits of the each proposed circuits. This circuit simulation was carried used cadence virtuoso EDA tool. The simulation results based on the 90nm CMOS process technology model.


2018 ◽  
Vol 7 (2.16) ◽  
pp. 94
Author(s):  
Abhishek Choubey ◽  
SPV Subbarao ◽  
Shruti B. Choubey

Multiplication is one of the most an essential arithmetic operation used in numerous applications in digital signal processing and communications. These applications need transformations, convolutions and dot products that involve an enormous amount of multiplications of an operand with a constant. Typical examples include wavelet, digital filters, such as FIR or IIR. However, multiplier structures have relatively large area-delay product, long latency and significantly high power consumption compared to other the arithmetic structure. Therefore, low power multiplier design has been always a significant part of DSP structure for VLSI design. The Booth multiplier is promising as the most efficient amongst the others multiplier as it reduces the complexity of considerably than others. In this paper, we have proposed Booth-multiplier using seamless pipelining. Theoretical comparison results show that the proposed Booth multiplier requires less critical path delay compared to traditional Booth multiplier. ASIC simulation results show proposed radix-16 Booth multiplier 13% less critical path delay for word width n=16 and 17% less critical path delay compared for bit width n=32 to best existing radix-16 Booth multiplier. 


2019 ◽  
Vol 28 (09) ◽  
pp. 1950149
Author(s):  
Bahram Rashidi ◽  
Mohammad Abedini

This paper presents efficient lightweight hardware implementations of the complete point multiplication on binary Edwards curves (BECs). The implementations are based on general and special cases of binary Edwards curves. The complete differential addition formulas have the cost of [Formula: see text] and [Formula: see text] for general and special cases of BECs, respectively, where [Formula: see text] and [Formula: see text] denote the costs of a field multiplication, a field squaring and a field multiplication by a constant, respectively. In the general case of BECs, the structure is implemented based on 3 concurrent multipliers. Also in the special case of BECs, two structures by employing 3 and 2 field multipliers are proposed for achieving the highest degree of parallelization and utilization of resources, respectively. The field multipliers are implemented based on the proposed efficient digit–digit polynomial basis multiplier. Two input operands of the multiplier proceed in digit level. This property leads to reduce hardware consumption and critical path delay. Also, in the structure, based on the change of input digit size from low digit size to high digit size the number of clock cycles and input words are different. Therefore, the multiplier can be flexible for different cryptographic considerations such as low-area and high-speed implementations. The point multiplication computation requires field inversion, therefore, we use a low-cost Extended Euclidean Algorithm (EEA) based inversion for implementation of this field operation. Implementation results of the proposed architectures based on Virtex-5 XC5VLX110 FPGA for two fields [Formula: see text] and [Formula: see text] are achieved. The results show improvements in terms of area and efficiency for the proposed structures compared to previous works.


Electronics ◽  
2020 ◽  
Vol 9 (10) ◽  
pp. 1657
Author(s):  
Lu Sun ◽  
Bin Wu ◽  
Tianchun Ye

In this article, a low-complexity and high-throughput sorted QR decomposition (SQRD) for multiple-input multiple-output (MIMO) detectors is presented. To reduce the heavy hardware overhead of SQRD, we propose an efficient SQRD algorithm based on a novel modified real-value decomposition (RVD). Compared to the latest study, the proposed SQRD algorithm can save the computational complexity by more than 44.7% with similar bit error rate (BER) performance. Furthermore, a corresponding deeply pipelined hardware architecture implemented with the coordinate rotation digital computer (CORDIC)-based Givens rotation (GR) is designed. In the design, we propose a time-sharing Givens rotation structure utilizing CORDIC modules in idle state to share the concurrent GR operations of other CORDIC modules, which can further reduce hardware complexity and improve hardware efficiency. The proposed SQRD processor is implemented in SMIC 55-nm CMOS technology, which processes 62.5 M SQRD per second at a 250-MHz operating frequency with only 176.5 kilo-gates. Compared to related studies, the proposed design has the best normalized hardware efficiency and achieves a 6-Gbps MIMO data rate which can support current high-speed wireless communication systems such as IEEE 802.11ax.


2019 ◽  
Vol 8 (4) ◽  
pp. 10189-10198 ◽  

Fast Fourier Transform (FFT) acts as an element in the high-speed signal processing application, which involves the following subsequent operations, namely complex addition, complex subtraction and complex multiplication. Due to the complex multiplication operation, the FFT structures lead to more hardware demand. Hence, this work introduces an area-efficient various N-point support radix-2 and radix-22 FFT structure by using proposed modified butterfly units and radix-2/22 butterfly unit. The proposed modified butterfly units are used to reduce the number of complex multipliers effectively. For this reason, it is using for certain conditions in FFT design instead of existing radix-2/22 butterfly unit. Further, the proposed design supported to perform various size of FFT in a single architecture without increasing the extra element demand. Moreover, the proposed FFT structure designed and implemented using a Xilinx Virtex-6 Field-Programmable Gate Array (FPGA) device (6vcx75tff484-2) and Cadence tool with 45nm CMOS technology. The implementation results demonstrate that the proposed N-point (N=16, 32 and 64) DIF-FFT design attains the less hardware complexity when compared with existing multi-mode FFT design. Then the proposed area-efficient 16-point, 32-point and 64-point radix-2 FFT architectures reduce the total area by 20.99%, 11% and 4.9% respectively. As well, the proposed area-efficient 16-point, 32-point and 64-point radix-22 FFT architectures reduce the total area by 32%, 19% and 11% respectively.


2019 ◽  
Vol 64 (2) ◽  
pp. 179-191
Author(s):  
Ved Mitra ◽  
Mahesh C. Govil ◽  
Girdhari Singh ◽  
Sanjeev Agrawal

Projective geometry (PG) based low-density parity-check (LDPC) decoder design using iterative sum-product decoding algorithm (SPA) is a big challenge due to higher interconnection and computational complexity, and larger memory requirement caused by relatively higher node degrees. PG-LDPC codes using SPA exhibits the best error performance and faster convergence. This paper presents an efficient novel decoding method, modified SPA (MSPA) that not only shortens the critical-path delay but also improves the hardware utilization and throughput of the decoder while maintaining the error performance of SPA. Three fully-parallel LDPC decoder designs based on PG structure, PG(2,GF( 2s )) of LDPC codes are introduced. These designs differ in their bit-node (BN) and check-node (CN) architectures. Fixed-point, 9-bit quantization scheme is used to achieve better error performance. Another significant contribution of this work is the pipelining of the proposed decoder architectures to further enhance the overall throughput. These parallel and pipelined designs are implemented for 73-bit (rate 0.616) and 1057-bit (rate 0.769) regular-structured PG-LDPC codes, on Xilinx Virtex-6 LX760 FPGA and on 0.18 μm CMOS technology for ASIC. Synthesis and simulation results have shown the better performance, throughput and effectiveness of the proposed designs.


2015 ◽  
Vol 25 (02) ◽  
pp. 1650004
Author(s):  
Pouya Asadi

In this paper, a new multiplier using array architecture and a fast carry network tree is presented which uses dynamic CMOS technology. Different reforms are performed in multiplier architecture. In the first step of multiplier operator, a novel radix-16 modified Booth encoder is presented which reduces the number of partial products efficiently. In this research, we present a new algorithm for partial product reduction in multiplication operations. The algorithm is based on the implementation of compressor elements by means of carry network. The structure of these compressors into reduction trees takes advantage of the modified Wallace tree for integration of adder cells and provides an alternative to conventional operator methods. We show several reduction techniques that illustrate the proposed method and describe carry-skip examples that combine dynamic CMOS with classic conventional compressors in order to modify each scheme. In network multiplier, a novel low power high-speed adder cell is presented which uses 14 transistors in its structure. Critical path is minimized to reduce latency in whole operator architecture. Final adder of multiplier uses an optimized carry hybrid adder. The presented final adder network uses dynamic CMOS technology. It sums two final operands in a very efficient way, which has significant effect in operator structure. Presented multiplier reduces latency by 12%, decreases transistor count by 8% and modifies noise problem in an efficient way in comparison with other structures.


Author(s):  
POOJA GUPTA ◽  
Saroj Kumar Lenka

This paper describes an efficient implementation for a multi-level convolution based 1-D DWT hardware architecture for use in FPGAs. The proposed architecture combines some hardware optimization techniques to develop a novel DWT architecture that has high performance and is suitable for portable and high speed devices. The first step towards the hardware implementation of the DWT algorithm was to choose the type of FIR filter block. Firstly we design the high speed linear phase FIR filter using pipelined and parallel arithmetic methods. This proposed filter employs efficiently distributed D-latches and multipliers. Furthermore this filter is used in the proposed DWT architecture. Thus, the new VLSI architecture based on combining of fast FIR filters for reducing the critical path delay and data interleaving technique for lower chip area. We synthesized the final design using Xilinx 9.1i ISE tool. We illustrate that a DWT design using a pipelined linear phase FIR filter coupled with data-interleaving gives the best combination of the performance metrics when compared to other DWT structures.


Sign in / Sign up

Export Citation Format

Share Document