scholarly journals Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions

2021 ◽  
Vol 7 (10) ◽  
pp. 210
Author(s):  
Cristian Sestito ◽  
Fanny Spagnolo ◽  
Stefania Perri

Nowadays, computer vision relies heavily on convolutional neural networks (CNNs) to perform complex and accurate tasks. Among them, super-resolution CNNs represent a meaningful example, due to the presence of both convolutional (CONV) and transposed convolutional (TCONV) layers. While the former exploit multiply-and-accumulate (MAC) operations to extract features of interest from incoming feature maps (fmaps), the latter perform MACs to tune the spatial resolution of the received fmaps properly. The ever-growing real-time and low-power requirements of modern computer vision applications represent a stimulus for the research community to investigate the deployment of CNNs on well-suited hardware platforms, such as field programmable gate arrays (FPGAs). FPGAs are widely recognized as valid candidates for trading off computational speed and power consumption, thanks to their flexibility and their capability to also deal with computationally intensive models. In order to reduce the number of operations to be performed, this paper presents a novel hardware-oriented algorithm able to efficiently accelerate both CONVs and TCONVs. The proposed strategy was validated by employing it within a reconfigurable hardware accelerator purposely designed to adapt itself to different operating modes set at run-time. When characterized using the Xilinx XC7K410T FPGA device, the proposed accelerator achieved a throughput of up to 2022.2 GOPS and, in comparison to state-of-the-art competitors, it reached an energy efficiency up to 2.3 times higher, without compromising the overall accuracy.

2021 ◽  
Author(s):  
Rishit Dagli ◽  
Süleyman Eken

Abstract Recent increases in computational power and the development of specialized architecture led to the possibility to perform machine learning, especially inference, on the edge. OpenVINO is a toolkit based on Convolutional Neural Networks that facilitates fast-track development of computer vision algorithms and deep learning neural networks into vision applications, and enables their easy heterogeneous execution across hardware platforms. A smart queue management can be the key to the success of any sector.} In this paper, we focus on edge deployments to make the Smart Queuing System (SQS) accessible by all also providing ability to run it on cheap devices. This gives it the ability to run the queuing system deep learning algorithms on pre-existing computers which a retail store, public transportation facility or a factory may already possess thus considerably reducing the cost of deployment of such a system. SQS demonstrates how to create a video AI solution on the edge. We validate our results by testing it on multiple edge devices namely CPU, Integrated Edge Graphic Processing Unit (iGPU), Vision Processing Unit (VPU) and Field Programmable Gate Arrays (FPGAs). Experimental results show that deploying a SQS on edge is very promising.


Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1823
Author(s):  
Tomyslav Sledevič ◽  
Artūras Serackis

The convolutional neural networks (CNNs) are a computation and memory demanding class of deep neural networks. The field-programmable gate arrays (FPGAs) are often used to accelerate the networks deployed in embedded platforms due to the high computational complexity of CNNs. In most cases, the CNNs are trained with existing deep learning frameworks and then mapped to FPGAs with specialized toolflows. In this paper, we propose a CNN core architecture called mNet2FPGA that places a trained CNN on a SoC FPGA. The processing system (PS) is responsible for convolution and fully connected core configuration according to the list of prescheduled instructions. The programmable logic holds cores of convolution and fully connected layers. The hardware architecture is based on the advanced extensible interface (AXI) stream processing with simultaneous bidirectional transfers between RAM and the CNN core. The core was tested on a cost-optimized Z-7020 FPGA with 16-bit fixed-point VGG networks. The kernel binarization and merging with the batch normalization layer were applied to reduce the number of DSPs in the multi-channel convolutional core. The convolutional core processes eight input feature maps at once and generates eight output channels of the same size and composition at 50 MHz. The core of the fully connected (FC) layer works at 100 MHz with up to 4096 neurons per layer. In a current version of the CNN core, the size of the convolutional kernel is fixed to 3×3. The estimated average performance is 8.6 GOPS for VGG13 and near 8.4 GOPS for VGG16/19 networks.


2007 ◽  
Vol 16 (04) ◽  
pp. 603-611 ◽  
Author(s):  
ARSHAD AZIZ ◽  
NASSAR IKRAM

Optimized implementation of computationally intensive cryptographic transformation is an area of active research, mainly focused on Advanced Encryption Standard (AES). Byte substitution implemented using substitution boxes (S-boxes), is the main transformation in AES which strains the enabling embedded platform, e.g., Field Programmable Gate Arrays. We represent a novel clocking technique enabling optimized implementation of Byte Substitution that enhances processing speed and reduces the area required for S-boxes on Xilinx FPGA Block RAM (BRAM).


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2637
Author(s):  
Ignacio Pérez ◽  
Miguel Figueroa

Convolutional neural networks (CNN) have been extensively employed for image classification due to their high accuracy. However, inference is a computationally-intensive process that often requires hardware acceleration to operate in real time. For mobile devices, the power consumption of graphics processors (GPUs) is frequently prohibitive, and field-programmable gate arrays (FPGA) become a solution to perform inference at high speed. Although previous works have implemented CNN inference on FPGAs, their high utilization of on-chip memory and arithmetic resources complicate their application on resource-constrained edge devices. In this paper, we present a scalable, low power, low resource-utilization accelerator architecture for inference on the MobileNet V2 CNN. The architecture uses a heterogeneous system with an embedded processor as the main controller, external memory to store network data, and dedicated hardware implemented on reconfigurable logic with a scalable number of processing elements (PE). Implemented on a XCZU7EV FPGA running at 200 MHz and using four PEs, the accelerator infers with 87% top-5 accuracy and processes an image of 224×224 pixels in 220 ms. It consumes 7.35 W of power and uses less than 30% of the logic and arithmetic resources used by other MobileNet FPGA accelerators.


Energies ◽  
2021 ◽  
Vol 14 (8) ◽  
pp. 2108
Author(s):  
Mohamed Yassine Allani ◽  
Jamel Riahi ◽  
Silvano Vergura ◽  
Abdelkader Mami

The development and optimization of a hybrid system composed of photovoltaic panels, wind turbines, converters, and batteries connected to the grid, is first presented. To generate the maximum power, two maximum power point tracker controllers based on fuzzy logic are required and a battery controller is used for the regulation of the DC voltage. When the power source varies, a high-voltage supply is incorporated (high gain DC-DC converter controlled by fuzzy logic) to boost the 24 V provided by the DC bus to the inverter voltage of about 400 V and to reduce energy losses to maximize the system performance. The inverter and the LCL filter allow for the integration of this hybrid system with AC loads and the grid. Moreover, a hardware solution for the field programmable gate arrays-based implementation of the controllers is proposed. The combination of these controllers was synthesized using the Integrated Synthesis Environment Design Suite software (Version: 14.7, City: Tunis, Country: Tunisia) and was successfully implemented on Field Programmable Gate Arrays Spartan 3E. The innovative design provides a suitable architecture based on power converters and control strategies that are dedicated to the proposed hybrid system to ensure system reliability. This implementation can provide a high level of flexibility that can facilitate the upgrade of a control system by simply updating or modifying the proposed algorithm running on the field programmable gate arrays board. The simulation results, using Matlab/Simulink (Version: 2016b, City: Tunis, Country: Tunisia, verify the efficiency of the proposed solution when the environmental conditions change. This study focused on the development and optimization of an electrical system control strategy to manage the produced energy and to coordinate the performance of the hybrid energy system. The paper proposes a combined photovoltaic and wind energy system, supported by a battery acting as an energy storage system. In addition, a bi-directional converter charges/discharges the battery, while a high-voltage gain converter connects them to the DC bus. The use of a battery is useful to compensate for the mismatch between the power demanded by the load and the power generated by the hybrid energy systems. The proposed field programmable gate arrays (FPGA)-based controllers ensure a fast time response by making control executable in real time.


Sign in / Sign up

Export Citation Format

Share Document