scholarly journals A Streaming High-Throughput Linear Sorter System with Contention Buffering

2011 ◽  
Vol 2011 ◽  
pp. 1-12 ◽  
Author(s):  
Jorge Ortiz ◽  
David Andrews

Popular sorting algorithms do not translate well into hardware implementations. Instead, hardware-based solutions like sorting networks, systolic sorters, and linear sorters exploit parallelism to increase sorting efficiency. Linear sorters, built from identical nodes with simple control, have less area and latency than sorting networks, but they are limited in their throughput. We present a system composed of multiple linear sorters acting in parallel to increase overall throughput. Interleaving is used to increase bandwidth and allow sorting of multiple values per clock cycle, and the amount of interleaving and depth of the linear sorters can be adapted to suit specific applications. Contention for available linear sorters in the system is solved through the use of buffers that accumulate conflicting requests, dispatching them in bulk to reduce latency penalties. Implementation of this system into a field programmable gate array (FPGA) results in a speedup of 68 compared to a MicroBlaze processor running quicksort.

Author(s):  
Ibrahem M. T. Hamidi ◽  
Farah S. H. Al-aassi

Aim: Achieve high throughput 128 bits FPGA based Advanced Encryption Standard. Background: Field Programmable Gate Array (FPGA) provides an efficient platform for design AES cryptography system. It provides the capability to control over each bit using HDL programming language such as VHDL and Verilog which results an output speed in Gbps rang. Objective: Use Field Programmable Gate Array (FPGA) to design high throughput 128 bits FPGA based Advanced Encryption Standard. Method: Pipelining technique has used to achieve maximum possible speed. The level of pipelining includes round pipelining and internal component pipelining where number of registers inserted in particular places to increase the output speed. The proposed design uses combinatorial logic to implement the byte substitution. The s-box implemented using composed field arithmetic with 7 stages of pipelining to reduce the combinatorial logic level. The presented model has implemented using VHDL in Xilinix ISETM 14.4 design tool. Result: The achieved results were 18.55 Gbps at a clock frequency of 144.96 MHz and area of 1568 Slices in Spartan3 xc3s1000 hardware. Conclusion: The results show that the proposed design reaches a high throughput with acceptable area usage compare with other designs in the literature.


Information ◽  
2019 ◽  
Vol 10 (4) ◽  
pp. 151
Author(s):  
Gabriele Meoni ◽  
Gianluca Giuffrida ◽  
Luca Fanucci

During the last years, recursive systematic convolutional (RSC) encoders have found application in modern telecommunication systems to reduce the bit error rate (BER). In view of the necessity of increasing the throughput of such applications, several approaches using hardware implementations of RSC encoders were explored. In this paper, we propose a hardware intellectual property (IP) for high throughput RSC encoders. The IP core exploits a methodology based on the ABCD matrices model which permits to increase the number of inputs bits processed in parallel. Through an analysis of the proposed network topology and by exploiting data relative to the implementation on Zynq 7000 xc7z010clg400-1 field programmable gate array (FPGA), an estimation of the dependency of the input data rate and of the source occupation on the parallelism degree is performed. Such analysis, together with the BER curves, provides a description of the principal merit parameters of a RSC encoder.


Author(s):  
Abhishek Kumar

A cyber-physical system over field-programmable gate array with optimized artificial intelligence algorithm is beneficial for society. Multiply and accumulate (MAC) unit is an integral part of a DSP processor. This chapter is focused on improving its performance parameters MAC based on column bypass multiplier. It highlights DSP's design for intelligent applications and the architectural setup of the broadly useful neuro-PC, based on the economically available DSP artificial intelligence engine (AI-engine). Adaptive hold logic in the multipliers section determines whether another clock cycle is required to finish multiplication. Adjustment in algorithm reduced the aging impact over cell result in the processor last longer and has increased its life cycle.


Sign in / Sign up

Export Citation Format

Share Document