A high performance hardware accelerator for dynamic texture segmentation

2015 ◽  
Vol 61 (10) ◽  
pp. 639-645 ◽  
Author(s):  
João P.F. Barbosa ◽  
Antonyus P.A. Ferreira ◽  
Rodrigo C.F. Rocha ◽  
Erika S. Albuquerque ◽  
Josivan R. Reis ◽  
...  
2021 ◽  
Vol 32 (8) ◽  
pp. 2035-2048
Author(s):  
Mochamad Asri ◽  
Dhairya Malhotra ◽  
Jiajun Wang ◽  
George Biros ◽  
Lizy K. John ◽  
...  

Electronics ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 449
Author(s):  
Mohammad Amir Mansoori ◽  
Mario R. Casu

Principal Component Analysis (PCA) is a technique for dimensionality reduction that is useful in removing redundant information in data for various applications such as Microwave Imaging (MI) and Hyperspectral Imaging (HI). The computational complexity of PCA has made the hardware acceleration of PCA an active research topic in recent years. Although the hardware design flow can be optimized using High Level Synthesis (HLS) tools, efficient high-performance solutions for complex embedded systems still require careful design. In this paper we propose a flexible PCA hardware accelerator in Field-Programmable Gate Arrays (FPGA) that we designed entirely in HLS. In order to make the internal PCA computations more efficient, a new block-streaming method is also introduced. Several HLS optimization strategies are adopted to create an efficient hardware. The flexibility of our design allows us to use it for different FPGA targets, with flexible input data dimensions, and it also lets us easily switch from a more accurate floating-point implementation to a higher speed fixed-point solution. The results show the efficiency of our design compared to state-of-the-art implementations on GPUs, many-core CPUs, and other FPGA approaches in terms of resource usage, execution time and power consumption.


VLSI Design ◽  
2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Antony Savich ◽  
Shawki Areibi

Many applications ranging from machine learning, image processing, and machine vision to optimization utilize matrix multiplication as a fundamental block. Matrix operations play an important role in determining the performance of such applications. This paper proposes a novel efficient, highly scalable hardware accelerator that is of equivalent performance to a 2 GHz quad core PC but can be used in low-power applications targeting embedded systems requiring high performance computation. Power, performance, and resource consumption are demonstrated on a fully-functional prototype. The proposed hardware accelerator is 36× more energy efficient per unit of computation compared to state-of-the-art Xeon processor of equal vintage and is 14× more efficient as a stand-alone platform with equivalent performance. An important comparison between simulated system estimates and real system performance is carried out.


Sign in / Sign up

Export Citation Format

Share Document