scholarly journals An Efficient FPGA Implementation of Richardson-Lucy Deconvolution Algorithm for Hyperspectral Images

Electronics ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 504
Author(s):  
Karine Avagian ◽  
Milica Orlandić

This paper proposes an implementation of a Richardson-Lucy (RL) deconvolution method to reduce the spatial degradation in hyperspectral images during the image acquisition process. The degradation, modeled by convolution with a point spread function (PSF), is reduced by applying both standard and accelerated RLdeconvolution algorithms on the individual images in spectral bands. Boundary conditions are introduced to maintain a constant image size without distorting the estimated image boundaries. The RL deconvolution algorithm is implemented on a field-programmable gate array (FPGA)-based Xilinx Zynq-7020 System-on-Chip (SoC). The proposed architecture is parameterized with respect to the image size and configurable with respect to the algorithm variant, the number of iterations, and the kernel size by setting the dedicated configuration registers. A speed-up by factors of 61 and 21 are reported compared to software-only and FPGA-based state-of-the-art implementations, respectively.

2021 ◽  
Vol 27 (3) ◽  
pp. 57-70
Author(s):  
Damjan M. Rakanovic ◽  
Vuk Vranjkovic ◽  
Rastislav J. R. Struharik

Paper proposes a two-step Convolutional Neural Network (CNN) pruning algorithm and resource-efficient Field-programmable gate array (FPGA) CNN accelerator named “Argus”. The proposed CNN pruning algorithm first combines similar kernels into clusters, which are then pruned using the same regular pruning pattern. The pruning algorithm is carefully tailored for FPGAs, considering their resource characteristics. Regular sparsity results in high Multiply-accumulate (MAC) efficiency, reducing the amount of logic required to balance workloads among different MAC units. As a result, the Argus accelerator requires about 170 Look-up tables (LUTs) per Digital Signal Processor (DSP) block. This number is close to the average LUT/DPS ratio for various FPGA families, enabling balanced resource utilization when implementing Argus. Benchmarks conducted using Xilinx Zynq Ultrascale + Multi-Processor System-on-Chip (MPSoC) indicate that Argus is achieving up to 25 times higher frames per second than NullHop, 2 and 2.5 times higher than NEURAghe and Snowflake, respectively, and 2 times higher than NVDLA. Argus shows comparable performance to MIT’s Eyeriss v2 and Caffeine, requiring up to 3 times less memory bandwidth and utilizing 4 times fewer DSP blocks, respectively. Besides the absolute performance, Argus has at least 1.3 and 2 times better GOP/s/DSP and GOP/s/Block-RAM (BRAM) ratios, while being competitive in terms of GOP/s/LUT, compared to some of the state-of-the-art solutions.


2020 ◽  
Vol 34 (04) ◽  
pp. 4780-4787
Author(s):  
Yuhang Li ◽  
Xin Dong ◽  
Sai Qian Zhang ◽  
Haoli Bai ◽  
Yuanpeng Chen ◽  
...  

To deploy deep neural networks on resource-limited devices, quantization has been widely explored. In this work, we study the extremely low-bit networks which have tremendous speed-up, memory saving with quantized activation and weights. We first bring up three omitted issues in extremely low-bit networks: the squashing range of quantized values; the gradient vanishing during backpropagation and the unexploited hardware acceleration of ternary networks. By reparameterizing quantized activation and weights vector with full precision scale and offset for fixed ternary vector, we decouple the range and magnitude from direction to extenuate above problems. Learnable scale and offset can automatically adjust the range of quantized values and sparsity without gradient vanishing. A novel encoding and computation pattern are designed to support efficient computing for our reparameterized ternary network (RTN). Experiments on ResNet-18 for ImageNet demonstrate that the proposed RTN finds a much better efficiency between bitwidth and accuracy and achieves up to 26.76% relative accuracy improvement compared with state-of-the-art methods. Moreover, we validate the proposed computation pattern on Field Programmable Gate Arrays (FPGA), and it brings 46.46 × and 89.17 × savings on power and area compared with the full precision convolution.


2012 ◽  
pp. 658-676
Author(s):  
Fabio Garzia ◽  
Roberto Airoldi ◽  
Jari Nurmi

This paper describes two general-purpose architectures targeted to Field Programmable Gate Array (FPGA) implementation. The first architecture is based on the coupling of a coarse-grain reconfigurable array with a general-purpose processor core. The second architecture is a homogeneous multi-processor system-on-chip (MP-SoC). Both architectures have been mapped onto two different Altera FPGA devices, a StratixII and a StratixIV. Although mapping onto the StratixIV results in higher operating frequencies, the capabilities of the device are not fully exploited. The implementation of a FFT on the two platforms shows a considerable speed-up in comparison with a single-processor reference architecture. The speed-up is higher in the reconfigurable solution but the MP-SoC provides an easier programming interface that is completely based on C language. The authors’ approach proves that implementing a programmable architecture on FPGA and then programming it using a high-level software language is a viable alternative to designing a dedicated hardware block with a hardware description language (HDL) and mapping it on FPGA.


Author(s):  
Fabio Garzia ◽  
Roberto Airoldi ◽  
Jari Nurmi

This paper describes two general-purpose architectures targeted to Field Programmable Gate Array (FPGA) implementation. The first architecture is based on the coupling of a coarse-grain reconfigurable array with a general-purpose processor core. The second architecture is a homogeneous multi-processor system-on-chip (MP-SoC). Both architectures have been mapped onto two different Altera FPGA devices, a StratixII and a StratixIV. Although mapping onto the StratixIV results in higher operating frequencies, the capabilities of the device are not fully exploited. The implementation of a FFT on the two platforms shows a considerable speed-up in comparison with a single-processor reference architecture. The speed-up is higher in the reconfigurable solution but the MP-SoC provides an easier programming interface that is completely based on C language. The authors’ approach proves that implementing a programmable architecture on FPGA and then programming it using a high-level software language is a viable alternative to designing a dedicated hardware block with a hardware description language (HDL) and mapping it on FPGA.


Author(s):  
Chen Yang ◽  
Jingyu Zhang ◽  
Qi Chen ◽  
Yi Xu ◽  
Cimang Lu

Pedestrian recognition has achieved the state-of-the-art performance due to the progress of recent convolutional neural network (CNN). However, mainstream CNN models are too complicated to emerging Computing-In-Memory (CIM) architectures for hardware implementation, because enormous parameters and massive intermediate processing results may incur severe “memory bottleneck”. This paper proposed a design methodology of Parameter Substitution with Nodes Compensation (PSNC) to significantly reduce parameters of CNN model without inference accuracy degradation. Based on the PSNC methodology, an ultra-lightweight convolutional neural network (UL-CNN) was designed. The UL-CNN model is a specially optimized convolutional neural network aiming at a flash-based CIM architecture (Conv-Flash) and to apply for recognizing person. The implementation result of running UL-CNN on Conv-Flash shows that the inference accuracy is up to 94.7%. Compared to LeNet-5, on the premise of the similar operations and accuracy, the amounts of UL-CNN’s parameters are less than 37% of LeNet-5 at the same dataset benchmark. Such parameter reduction can dramatically speed up the training process and economize on-chip storage overhead, as well as save the power consumption of the memory access. With the aid of UL-CNN, the Conv-Flash architecture can provide the best energy efficiency compared to other platforms (CPU, GPU, FPGA, etc.), which consumes only 2.2[Formula: see text] 105J to complete pedestrian recognition for one frame.


Author(s):  
Fabio Garzia ◽  
Roberto Airoldi ◽  
Jari Nurmi

This paper describes two general-purpose architectures targeted to Field Programmable Gate Array (FPGA) implementation. The first architecture is based on the coupling of a coarse-grain reconfigurable array with a general-purpose processor core. The second architecture is a homogeneous multi-processor system-on-chip (MP-SoC). Both architectures have been mapped onto two different Altera FPGA devices, a StratixII and a StratixIV. Although mapping onto the StratixIV results in higher operating frequencies, the capabilities of the device are not fully exploited. The implementation of a FFT on the two platforms shows a considerable speed-up in comparison with a single-processor reference architecture. The speed-up is higher in the reconfigurable solution but the MP-SoC provides an easier programming interface that is completely based on C language. The authors’ approach proves that implementing a programmable architecture on FPGA and then programming it using a high-level software language is a viable alternative to designing a dedicated hardware block with a hardware description language (HDL) and mapping it on FPGA.


Author(s):  
Ш.С. Фахми ◽  
Н.В. Шаталова ◽  
В.В. Вислогузов ◽  
Е.В. Костикова

В данной работе предлагаются математический аппарат и архитектура многопроцессорной транспортной системы на кристалле (МПТСнК). Выполнена программно-аппаратная реализация интеллектуальной системы видеонаблюдения на базе технологии «система на кристалле» и с использованием аппаратного ускорителя известного метода формирования опорных векторов. Архитектура включает в себя сложно-функциональные блоки анализа видеоинформации на базе параллельных алгоритмов нахождения опорных точек изображений и множества элементарных процессоров для выполнения сложных вычислительных процедур алгоритмов анализа с использованием средств проектирования на базе реконфигурируемой системы на кристалле, позволяющей оценить количество аппаратных ресурсов. Предлагаемая архитектура МПТСнК позволяет ускорить обработку и анализ видеоинформации при решении задач обнаружения и распознавания чрезвычайных ситуаций и подозрительных поведений. In this paper, we propose the mathematical apparatus and architecture of a multiprocessor transport system on a chip (MPTSoC). Software and hardware implementation of an intelligent video surveillance system based on the "system on chip" technology and using a hardware accelerator of the well-known method of forming reference vectors. The architecture includes complex functional blocks for analyzing video information based on parallel algorithms for finding image reference points and a set of elementary processors for performing complex computational procedures for algorithmic analysis. using design tools based on a reconfigurable system on chip that allows you to estimate the amount of hardware resources. The proposed MPTSoC architecture makes it possible to speed up the processing and analysis of video information when solving problems of detecting and recognizing emergencies and suspicious behaviors


Author(s):  
Kevin Bellofatto ◽  
Beat Moeckli ◽  
Charles-Henri Wassmer ◽  
Margaux Laurent ◽  
Graziano Oldani ◽  
...  

Abstract Purpose of Review β cell replacement via whole pancreas or islet transplantation has greatly evolved for the cure of type 1 diabetes. Both these strategies are however still affected by several limitations. Pancreas bioengineering holds the potential to overcome these hurdles aiming to repair and regenerate β cell compartment. In this review, we detail the state-of-the-art and recent progress in the bioengineering field applied to diabetes research. Recent Findings The primary target of pancreatic bioengineering is to manufacture a construct supporting insulin activity in vivo. Scaffold-base technique, 3D bioprinting, macro-devices, insulin-secreting organoids, and pancreas-on-chip represent the most promising technologies for pancreatic bioengineering. Summary There are several factors affecting the clinical application of these technologies, and studies reported so far are encouraging but need to be optimized. Nevertheless pancreas bioengineering is evolving very quickly and its combination with stem cell research developments can only accelerate this trend.


2021 ◽  
Vol 297 ◽  
pp. 126645
Author(s):  
Gajanan Sampatrao Ghodake ◽  
Surendra Krushna Shinde ◽  
Avinash Ashok Kadam ◽  
Rijuta Ganesh Saratale ◽  
Ganesh Dattatraya Saratale ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document