Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions

Cristian Sestito; Fanny Spagnolo; Stefania Perri

doi:10.3390/jimaging7100210

Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions

Journal of Imaging ◽

10.3390/jimaging7100210 ◽

2021 ◽

Vol 7 (10) ◽

pp. 210

Author(s):

Cristian Sestito ◽

Fanny Spagnolo ◽

Stefania Perri

Keyword(s):

Computer Vision ◽

Super Resolution ◽

Feature Maps ◽

Operating Modes ◽

Gate Arrays ◽

Modern Computer ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Computationally Intensive ◽

Hardware Platforms

Nowadays, computer vision relies heavily on convolutional neural networks (CNNs) to perform complex and accurate tasks. Among them, super-resolution CNNs represent a meaningful example, due to the presence of both convolutional (CONV) and transposed convolutional (TCONV) layers. While the former exploit multiply-and-accumulate (MAC) operations to extract features of interest from incoming feature maps (fmaps), the latter perform MACs to tune the spatial resolution of the received fmaps properly. The ever-growing real-time and low-power requirements of modern computer vision applications represent a stimulus for the research community to investigate the deployment of CNNs on well-suited hardware platforms, such as field programmable gate arrays (FPGAs). FPGAs are widely recognized as valid candidates for trading off computational speed and power consumption, thanks to their flexibility and their capability to also deal with computationally intensive models. In order to reduce the number of operations to be performed, this paper presents a novel hardware-oriented algorithm able to efficiently accelerate both CONVs and TCONVs. The proposed strategy was validated by employing it within a reconfigurable hardware accelerator purposely designed to adapt itself to different operating modes set at run-time. When characterized using the Xilinx XC7K410T FPGA device, the proposed accelerator achieved a throughput of up to 2022.2 GOPS and, in comparison to state-of-the-art competitors, it reached an energy efficiency up to 2.3 times higher, without compromising the overall accuracy.

Download Full-text

Deploying a Smart Queuing System on Edge with Intel OpenVINO Toolkit

10.21203/rs.3.rs-509460/v1 ◽

2021 ◽

Author(s):

Rishit Dagli ◽

Süleyman Eken

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Public Transportation ◽

Queuing System ◽

Processing Unit ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

The Cost ◽

Hardware Platforms

Abstract Recent increases in computational power and the development of specialized architecture led to the possibility to perform machine learning, especially inference, on the edge. OpenVINO is a toolkit based on Convolutional Neural Networks that facilitates fast-track development of computer vision algorithms and deep learning neural networks into vision applications, and enables their easy heterogeneous execution across hardware platforms. A smart queue management can be the key to the success of any sector.} In this paper, we focus on edge deployments to make the Smart Queuing System (SQS) accessible by all also providing ability to run it on cheap devices. This gives it the ability to run the queuing system deep learning algorithms on pre-existing computers which a retail store, public transportation facility or a factory may already possess thus considerably reducing the cost of deployment of such a system. SQS demonstrates how to create a video AI solution on the edge. We validate our results by testing it on multiple edge devices namely CPU, Integrated Edge Graphic Processing Unit (iGPU), Vision Processing Unit (VPU) and Field Programmable Gate Arrays (FPGAs). Experimental results show that deploying a SQS on edge is very promising.

Download Full-text

mNet2FPGA: A Design Flow for Mapping a Fixed-Point CNN to Zynq SoC FPGA

Electronics ◽

10.3390/electronics9111823 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1823

Author(s):

Tomyslav Sledevič ◽

Artūras Serackis

Keyword(s):

Neural Networks ◽

Fixed Point ◽

Processing System ◽

Design Flow ◽

Feature Maps ◽

Gate Arrays ◽

The Core ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Fully Connected

The convolutional neural networks (CNNs) are a computation and memory demanding class of deep neural networks. The field-programmable gate arrays (FPGAs) are often used to accelerate the networks deployed in embedded platforms due to the high computational complexity of CNNs. In most cases, the CNNs are trained with existing deep learning frameworks and then mapped to FPGAs with specialized toolflows. In this paper, we propose a CNN core architecture called mNet2FPGA that places a trained CNN on a SoC FPGA. The processing system (PS) is responsible for convolution and fully connected core configuration according to the list of prescheduled instructions. The programmable logic holds cores of convolution and fully connected layers. The hardware architecture is based on the advanced extensible interface (AXI) stream processing with simultaneous bidirectional transfers between RAM and the CNN core. The core was tested on a cost-optimized Z-7020 FPGA with 16-bit fixed-point VGG networks. The kernel binarization and merging with the batch normalization layer were applied to reduce the number of DSPs in the multi-channel convolutional core. The convolutional core processes eight input feature maps at once and generates eight output channels of the same size and composition at 50 MHz. The core of the fully connected (FC) layer works at 100 MHz with up to 4096 neurons per layer. In a current version of the CNN core, the size of the convolutional kernel is fixed to 3×3. The estimated average performance is 8.6 GOPS for VGG13 and near 8.4 GOPS for VGG16/19 networks.

Download Full-text

MEMORY EFFICIENT IMPLEMENTATION OF AES S-BOXES ON FPGA

Journal of Circuits System and Computers ◽

10.1142/s0218126607003873 ◽

2007 ◽

Vol 16 (04) ◽

pp. 603-611 ◽

Cited By ~ 8

Author(s):

ARSHAD AZIZ ◽

NASSAR IKRAM

Keyword(s):

Processing Speed ◽

Gate Arrays ◽

Embedded Platform ◽

Substitution Boxes ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Computationally Intensive ◽

Active Research ◽

Xilinx Fpga ◽

Memory Efficient

Optimized implementation of computationally intensive cryptographic transformation is an area of active research, mainly focused on Advanced Encryption Standard (AES). Byte substitution implemented using substitution boxes (S-boxes), is the main transformation in AES which strains the enabling embedded platform, e.g., Field Programmable Gate Arrays. We represent a novel clocking technique enabling optimized implementation of Byte Substitution that enhances processing speed and reduces the area required for S-boxes on Xilinx FPGA Block RAM (BRAM).

Download Full-text

A Heterogeneous Hardware Accelerator for Image Classification in Embedded Systems

Sensors ◽

10.3390/s21082637 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2637

Author(s):

Ignacio Pérez ◽

Miguel Figueroa

Keyword(s):

Image Classification ◽

High Speed ◽

Hardware Acceleration ◽

Graphics Processors ◽

Embedded Processor ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Computationally Intensive ◽

On Chip

Convolutional neural networks (CNN) have been extensively employed for image classification due to their high accuracy. However, inference is a computationally-intensive process that often requires hardware acceleration to operate in real time. For mobile devices, the power consumption of graphics processors (GPUs) is frequently prohibitive, and field-programmable gate arrays (FPGA) become a solution to perform inference at high speed. Although previous works have implemented CNN inference on FPGAs, their high utilization of on-chip memory and arithmetic resources complicate their application on resource-constrained edge devices. In this paper, we present a scalable, low power, low resource-utilization accelerator architecture for inference on the MobileNet V2 CNN. The architecture uses a heterogeneous system with an embedded processor as the main controller, external memory to store network data, and dedicated hardware implemented on reconfigurable logic with a scalable number of processing elements (PE). Implemented on a XCZU7EV FPGA running at 200 MHz and using four PEs, the accelerator infers with 87% top-5 accuracy and processes an image of 224×224 pixels in 220 ms. It consumes 7.35 W of power and uses less than 30% of the logic and arithmetic resources used by other MobileNet FPGA accelerators.

Download Full-text

Memory Requirement Reduction of Deep Neural Networks for Field Programmable Gate Arrays Using Low-Bit Quantization of Parameters

2020 28th European Signal Processing Conference (EUSIPCO) ◽

10.23919/eusipco47968.2020.9287739 ◽

2021 ◽

Author(s):

Niccolo Nicodemo ◽

Gaurav Naithani ◽

Konstantinos Drossos ◽

Tuomas Virtanen ◽

Roberto Saletti

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Field Programmable Gate Arrays ◽

Memory Requirement ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text

Optically Programmable Field Programmable Gate Arrays (FPGA) Systems

10.21236/ada421336 ◽

2004 ◽

Author(s):

Jose Mumbru ◽

George Panotopoulos ◽

Demetri Psaltis

Keyword(s):

Field Programmable Gate Arrays ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text

Field-Programmable Gate Arrays: Architecture and Tools for Rapid Prototyping

10.1007/3-540-57091-8 ◽

1993 ◽

Keyword(s):

Rapid Prototyping ◽

Field Programmable Gate Arrays ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text

FPGA-Based Controller for a Hybrid Grid-Connected PV/Wind/Battery Power System with AC Load

Energies ◽

10.3390/en14082108 ◽

2021 ◽

Vol 14 (8) ◽

pp. 2108

Author(s):

Mohamed Yassine Allani ◽

Jamel Riahi ◽

Silvano Vergura ◽

Abdelkader Mami

Keyword(s):

Fuzzy Logic ◽

Hybrid System ◽

High Voltage ◽

Field Programmable Gate Arrays ◽

Energy System ◽

Maximum Power ◽

Hybrid Energy ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

The development and optimization of a hybrid system composed of photovoltaic panels, wind turbines, converters, and batteries connected to the grid, is first presented. To generate the maximum power, two maximum power point tracker controllers based on fuzzy logic are required and a battery controller is used for the regulation of the DC voltage. When the power source varies, a high-voltage supply is incorporated (high gain DC-DC converter controlled by fuzzy logic) to boost the 24 V provided by the DC bus to the inverter voltage of about 400 V and to reduce energy losses to maximize the system performance. The inverter and the LCL filter allow for the integration of this hybrid system with AC loads and the grid. Moreover, a hardware solution for the field programmable gate arrays-based implementation of the controllers is proposed. The combination of these controllers was synthesized using the Integrated Synthesis Environment Design Suite software (Version: 14.7, City: Tunis, Country: Tunisia) and was successfully implemented on Field Programmable Gate Arrays Spartan 3E. The innovative design provides a suitable architecture based on power converters and control strategies that are dedicated to the proposed hybrid system to ensure system reliability. This implementation can provide a high level of flexibility that can facilitate the upgrade of a control system by simply updating or modifying the proposed algorithm running on the field programmable gate arrays board. The simulation results, using Matlab/Simulink (Version: 2016b, City: Tunis, Country: Tunisia, verify the efficiency of the proposed solution when the environmental conditions change. This study focused on the development and optimization of an electrical system control strategy to manage the produced energy and to coordinate the performance of the hybrid energy system. The paper proposes a combined photovoltaic and wind energy system, supported by a battery acting as an energy storage system. In addition, a bi-directional converter charges/discharges the battery, while a high-voltage gain converter connects them to the DC bus. The use of a battery is useful to compensate for the mismatch between the power demanded by the load and the power generated by the hybrid energy systems. The proposed field programmable gate arrays (FPGA)-based controllers ensure a fast time response by making control executable in real time.

Download Full-text

Wiring requirement and three-dimensional integration technology for field programmable gate arrays

IEEE Transactions on Very Large Scale Integration (VLSI) Systems ◽

10.1109/tvlsi.2003.810003 ◽

2003 ◽

Vol 11 (1) ◽

pp. 44-54 ◽

Cited By ~ 58

Author(s):

A. Rahman ◽

S. Das ◽

A.P. Chandrakasan ◽

R. Reif

Keyword(s):

Three Dimensional ◽

Field Programmable Gate Arrays ◽

Integration Technology ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text

Realization of a soft output Viterbi equalizer using field programmable gate arrays

IEEE 43rd Vehicular Technology Conference ◽

10.1109/vetec.1993.508769 ◽

2002 ◽

Cited By ~ 1

Author(s):

P. Jung ◽

J. Blanz

Keyword(s):

Field Programmable Gate Arrays ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text