A high performance hardware accelerator for dynamic texture segmentation

João P.F. Barbosa; Antonyus P.A. Ferreira; Rodrigo C.F. Rocha; Erika S. Albuquerque; Josivan R. Reis; Djeefther S. Albuquerque; Edna N.S. Barros

doi:10.1016/j.sysarc.2015.09.005

A high performance hardware accelerator for dynamic texture segmentation

Journal of Systems Architecture ◽

10.1016/j.sysarc.2015.09.005 ◽

2015 ◽

Vol 61 (10) ◽

pp. 639-645 ◽

Cited By ~ 5

Author(s):

João P.F. Barbosa ◽

Antonyus P.A. Ferreira ◽

Rodrigo C.F. Rocha ◽

Erika S. Albuquerque ◽

Josivan R. Reis ◽

...

Keyword(s):

High Performance ◽

Texture Segmentation ◽

Hardware Accelerator ◽

Dynamic Texture

Download Full-text

Hardware Accelerator Integration Tradeoffs for High-Performance Computing: A Case Study of GEMM Acceleration in N-Body Methods

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2021.3056045 ◽

2021 ◽

Vol 32 (8) ◽

pp. 2035-2048

Author(s):

Mochamad Asri ◽

Dhairya Malhotra ◽

Jiajun Wang ◽

George Biros ◽

Lizy K. John ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Hardware Accelerator ◽

Performance Computing

Download Full-text

Unified dynamic texture segmentation system based on local and global spatiotemporal techniques

International Journal of Reasoning-based Intelligent Systems ◽

10.1504/ijris.2019.099855 ◽

2019 ◽

Vol 11 (2) ◽

pp. 170

Author(s):

Shilpa Paygude ◽

Vibha Vyas

Keyword(s):

Texture Segmentation ◽

Dynamic Texture

Download Full-text

High Level Design of a Flexible PCA Hardware Accelerator Using a New Block-Streaming Method

Electronics ◽

10.3390/electronics9030449 ◽

2020 ◽

Vol 9 (3) ◽

pp. 449

Author(s):

Mohammad Amir Mansoori ◽

Mario R. Casu

Keyword(s):

High Performance ◽

Principal Component ◽

Hardware Acceleration ◽

Design Flow ◽

Hardware Accelerator ◽

Field Programmable ◽

Point Solution ◽

Active Research ◽

High Level ◽

Many Core

Principal Component Analysis (PCA) is a technique for dimensionality reduction that is useful in removing redundant information in data for various applications such as Microwave Imaging (MI) and Hyperspectral Imaging (HI). The computational complexity of PCA has made the hardware acceleration of PCA an active research topic in recent years. Although the hardware design flow can be optimized using High Level Synthesis (HLS) tools, efficient high-performance solutions for complex embedded systems still require careful design. In this paper we propose a flexible PCA hardware accelerator in Field-Programmable Gate Arrays (FPGA) that we designed entirely in HLS. In order to make the internal PCA computations more efficient, a new block-streaming method is also introduced. Several HLS optimization strategies are adopted to create an efficient hardware. The flexibility of our design allows us to use it for different FPGA targets, with flexible input data dimensions, and it also lets us easily switch from a more accurate floating-point implementation to a higher speed fixed-point solution. The results show the efficiency of our design compared to state-of-the-art implementations on GPUs, many-core CPUs, and other FPGA approaches in terms of resource usage, execution time and power consumption.

Download Full-text

QAT: Evaluation of a dedicated hardware accelerator for high performance web service

2018 20th International Conference on Advanced Communication Technology (ICACT) ◽

10.23919/icact.2018.8323723 ◽

2018 ◽

Author(s):

Xue Shuai ◽

Liu Yao ◽

Zhang Wang

Keyword(s):

Web Service ◽

High Performance ◽

Hardware Accelerator ◽

Dedicated Hardware

Download Full-text

FPGA-based hardware accelerator for high-performance data-stream processing

Pattern Recognition and Image Analysis ◽

10.1134/s1054661812030054 ◽

2013 ◽

Vol 23 (1) ◽

pp. 26-34 ◽

Cited By ~ 8

Author(s):

K. F. Lysakov ◽

M. Yu. Shadrin

Keyword(s):

Data Stream ◽

High Performance ◽

Stream Processing ◽

Performance Data ◽

Hardware Accelerator ◽

Data Stream Processing

Download Full-text

A Low-Power Scalable Stream Compute Accelerator for General Matrix Multiply (GEMM)

VLSI Design ◽

10.1155/2014/712085 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Antony Savich ◽

Shawki Areibi

Keyword(s):

Low Power ◽

High Performance ◽

Matrix Multiplication ◽

Hardware Accelerator ◽

Power Performance ◽

Matrix Operations ◽

Simulated System ◽

Scalable Hardware ◽

Functional Prototype ◽

High Performance Computation

Many applications ranging from machine learning, image processing, and machine vision to optimization utilize matrix multiplication as a fundamental block. Matrix operations play an important role in determining the performance of such applications. This paper proposes a novel efficient, highly scalable hardware accelerator that is of equivalent performance to a 2 GHz quad core PC but can be used in low-power applications targeting embedded systems requiring high performance computation. Power, performance, and resource consumption are demonstrated on a fully-functional prototype. The proposed hardware accelerator is 36× more energy efficient per unit of computation compared to state-of-the-art Xeon processor of equal vintage and is 14× more efficient as a stand-alone platform with equivalent performance. An important comparison between simulated system estimates and real system performance is carried out.

Download Full-text