Hardware Acceleration of Matrix Multiplication on a Xilinx FPGA

Many multimedia applications require a flexible image pr ocessing architecture. In this paper, we present the use of a hardware acceleration module (Discrete Cosine Transform (DCT) and Inverse DCT (IDCT) coupled with a software partition running on a PowerPC Processor of a Xilinx FPGA. Therefore we have the benefits of flexible software partition on the PowerPC and the acceleration given by the remaining logic of the same FPGA. This implementation can be used in the context of video coding, object recognition, etc. The experimental results show optimization in processing time offered by hardware acceleration vs. software implementation.

Download Full-text

Accelerating iterative CT reconstruction algorithms using Tensor Cores

Journal of Real-Time Image Processing ◽

10.1007/s11554-020-01069-5 ◽

2021 ◽

Author(s):

Mohsen Nourazar ◽

Bart Goossens

Keyword(s):

Machine Learning ◽

Real Time ◽

Programming Model ◽

Matrix Multiplication ◽

Hardware Acceleration ◽

Parallel Projection ◽

Ct Reconstruction ◽

Reconstruction Algorithms ◽

Reconstruction Methods ◽

Mixed Precision

AbstractTensor Cores are specialized hardware units added to recent NVIDIA GPUs to speed up matrix multiplication-related tasks, such as convolutions and densely connected layers in neural networks. Due to their specific hardware implementation and programming model, Tensor Cores cannot be straightforwardly applied to other applications outside machine learning. In this paper, we demonstrate the feasibility of using NVIDIA Tensor Cores for the acceleration of a non-machine learning application: iterative Computed Tomography (CT) reconstruction. For large CT images and real-time CT scanning, the reconstruction time for many existing iterative reconstruction methods is relatively high, ranging from seconds to minutes, depending on the size of the image. Therefore, CT reconstruction is an application area that could potentially benefit from Tensor Core hardware acceleration. We first studied the reconstruction algorithm’s performance as a function of the hardware related parameters and proposed an approach to accelerate reconstruction on Tensor Cores. The results show that the proposed method provides about 5 $$\times $$ × increase in speed and energy saving using the NVIDIA RTX 2080 Ti GPU for the parallel projection of 32 images of size $$512\times 512$$ 512 × 512 . The relative reconstruction error due to the mixed-precision computations was almost equal to the error of single-precision (32-bit) floating-point computations. We then presented an approach for real-time and memory-limited applications by exploiting the symmetry of the system (i.e., the acquisition geometry). As the proposed approach is based on the conjugate gradient method, it can be generalized to extend its application to many research and industrial fields.

Download Full-text

Hardware Acceleration of Bilateral Filters

IEEJ Transactions on Industry Applications ◽

10.1541/ieejias.133.132 ◽

2013 ◽

Vol 133 (2) ◽

pp. 132-138

Author(s):

Shuhei Isa ◽

Chikatoshi Yamada ◽

Yasunori Nagata

Keyword(s):

Hardware Acceleration ◽

Bilateral Filters

Download Full-text

The real-time computing of the Stirling formula based on hardware acceleration

Future Computer and Information Technology ◽

10.2495/icfcit130331 ◽

2013 ◽

Author(s):

Jiyang Yu ◽

Dan Huang ◽

Siyang Zhao ◽

Nan Pei ◽

Huixia Cheng ◽

...

Keyword(s):

Real Time ◽

Hardware Acceleration ◽

Stirling Formula ◽

The Real ◽

Real Time Computing

Download Full-text

Stable and Supported Semantics in Continuous Vector Spaces

Proceedings of the Seventeenth International Conference on Principles of Knowledge Representation and Reasoning ◽

10.24963/kr.2020/7 ◽

2020 ◽

Author(s):

Yaniv Aspis ◽

Krysia Broda ◽

Alessandra Russo ◽

Jorge Lobo

Keyword(s):

Logic Program ◽

Matrix Multiplication ◽

Vector Spaces ◽

Continuous Space ◽

Continuous Vector ◽

Novel Approach ◽

Gradient Based ◽

Normal Logic ◽

Parameter Values ◽

Normal Logic Program

We introduce a novel approach for the computation of stable and supported models of normal logic programs in continuous vector spaces by a gradient-based search method. Specifically, the application of the immediate consequence operator of a program reduct can be computed in a vector space. To do this, Herbrand interpretations of a propositional program are embedded as 0-1 vectors in $\mathbb{R}^N$ and program reducts are represented as matrices in $\mathbb{R}^{N \times N}$. Using these representations we prove that the underlying semantics of a normal logic program is captured through matrix multiplication and a differentiable operation. As supported and stable models of a normal logic program can now be seen as fixed points in a continuous space, non-monotonic deduction can be performed using an optimisation process such as Newton's method. We report the results of several experiments using synthetically generated programs that demonstrate the feasibility of the approach and highlight how different parameter values can affect the behaviour of the system.

Download Full-text

A Parallel Matrix Multiplication Algorithm Based on Network of Moore Graph of Diameter 2

Chinese Journal of Computers ◽

10.3724/sp.j.1016.2013.01843 ◽

2014 ◽

Vol 36 (9) ◽

pp. 1843-1849

Author(s):

Bing ZHANG

Keyword(s):

Matrix Multiplication ◽

Matrix Multiplication Algorithm ◽

Multiplication Algorithm ◽

Moore Graph

Download Full-text

Design and implementation of matrix hardware acceleration based on FPGA/Nios-Ⅱ

JOURNAL OF ELECTRONIC MEASUREMENT AND INSTRUMENT ◽

10.3724/sp.j.1187.2011.00377 ◽

2011 ◽

Vol 25 (4) ◽

pp. 377-383 ◽

Cited By ~ 1

Author(s):

Fang Xu ◽

Yi Xi ◽

Hong Chen ◽

Weiwei Jin

Keyword(s):

Hardware Acceleration ◽

Design And Implementation

Download Full-text

Modular Matrix Multiplication on a Linear Array.

10.21236/ada139852 ◽

1983 ◽

Author(s):

I. V. Ramakrishnan ◽

P. J. Varman

Keyword(s):

Linear Array ◽

Matrix Multiplication

Download Full-text

Design and Implementation of Low Energy Wireless Network Nodes based on Hardware Compression Acceleration

Recent Patents on Computer Science ◽

10.2174/2213275912666190715164024 ◽

2019 ◽

Vol 12 ◽

Author(s):

Hui Yang ◽

Anand Nayyar

Keyword(s):

Energy Consumption ◽

Data Compression ◽

Energy Saving ◽

Optimization Design ◽

Hardware Acceleration ◽

Transmission Efficiency ◽

General Purpose ◽

Storage Space ◽

General Purpose Processor ◽

Compression Time

: In the fast development of information, the information data is increasing in geometric multiples, and the speed of information transmission and storage space are required to be higher. In order to reduce the use of storage space and further improve the transmission efficiency of data, data need to be compressed. processing. In the process of data compression, it is very important to ensure the lossless nature of data, and lossless data compression algorithms appear. The gradual optimization design of the algorithm can often achieve the energy-saving optimization of data compression. Similarly, The effect of energy saving can also be obtained by improving the hardware structure of node. In this paper, a new structure is designed for sensor node, which adopts hardware acceleration, and the data compression module is separated from the node microprocessor.On the basis of the ASIC design of the algorithm, by introducing hardware acceleration, the energy consumption of the compressed data was successfully reduced, and the proportion of energy consumption and compression time saved by the general-purpose processor was as high as 98.4 % and 95.8 %, respectively. It greatly reduces the compression time and energy consumption.

Download Full-text