Exploiting crosstalk to speed up on-chip buses

Multiprocessor transport surveillance video system based on "system on chip" technology

MORSKIE INTELLEKTUAL`NYE TEHNOLOGII ◽

10.37220/mit.2020.49.3.023 ◽

2020 ◽

Author(s):

Ш.С. Фахми ◽

Н.В. Шаталова ◽

В.В. Вислогузов ◽

Е.В. Костикова

Keyword(s):

System On Chip ◽

Reference Points ◽

Surveillance Video ◽

Video Information ◽

Chip Technology ◽

Intelligent Video Surveillance System ◽

Speed Up ◽

Algorithmic Analysis ◽

Computational Procedures ◽

On Chip

В данной работе предлагаются математический аппарат и архитектура многопроцессорной транспортной системы на кристалле (МПТСнК). Выполнена программно-аппаратная реализация интеллектуальной системы видеонаблюдения на базе технологии «система на кристалле» и с использованием аппаратного ускорителя известного метода формирования опорных векторов. Архитектура включает в себя сложно-функциональные блоки анализа видеоинформации на базе параллельных алгоритмов нахождения опорных точек изображений и множества элементарных процессоров для выполнения сложных вычислительных процедур алгоритмов анализа с использованием средств проектирования на базе реконфигурируемой системы на кристалле, позволяющей оценить количество аппаратных ресурсов. Предлагаемая архитектура МПТСнК позволяет ускорить обработку и анализ видеоинформации при решении задач обнаружения и распознавания чрезвычайных ситуаций и подозрительных поведений. In this paper, we propose the mathematical apparatus and architecture of a multiprocessor transport system on a chip (MPTSoC). Software and hardware implementation of an intelligent video surveillance system based on the "system on chip" technology and using a hardware accelerator of the well-known method of forming reference vectors. The architecture includes complex functional blocks for analyzing video information based on parallel algorithms for finding image reference points and a set of elementary processors for performing complex computational procedures for algorithmic analysis. using design tools based on a reconfigurable system on chip that allows you to estimate the amount of hardware resources. The proposed MPTSoC architecture makes it possible to speed up the processing and analysis of video information when solving problems of detecting and recognizing emergencies and suspicious behaviors

Download Full-text

How to Speed-Up Fault-Tolerant Clock Generation in VLSI Systems-on-Chip via Pipelining

2010 European Dependable Computing Conference ◽

10.1109/edcc.2010.35 ◽

2010 ◽

Cited By ~ 5

Author(s):

Matthias Függer ◽

Andreas Dielacher ◽

Ulrich Schmid

Keyword(s):

Fault Tolerant ◽

Clock Generation ◽

Systems On Chip ◽

Speed Up ◽

On Chip

Download Full-text

Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAs

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3466823 ◽

2022 ◽

Vol 15 (2) ◽

pp. 1-33

Author(s):

Mikhail Asiatici ◽

Paolo Ienne

Keyword(s):

Large Scale ◽

Sparse Matrix ◽

Memory Systems ◽

Graph Analytics ◽

Matrix Vector Multiplication ◽

Area Reduction ◽

Cache Line ◽

Speed Up ◽

Memory Accesses ◽

On Chip

Applications such as large-scale sparse linear algebra and graph analytics are challenging to accelerate on FPGAs due to the short irregular memory accesses, resulting in low cache hit rates. Nonblocking caches reduce the bandwidth required by misses by requesting each cache line only once, even when there are multiple misses corresponding to it. However, such reuse mechanism is traditionally implemented using an associative lookup. This limits the number of misses that are considered for reuse to a few tens, at most. In this article, we present an efficient pipeline that can process and store thousands of outstanding misses in cuckoo hash tables in on-chip SRAM with minimal stalls. This brings the same bandwidth advantage as a larger cache for a fraction of the area budget, because outstanding misses do not need a data array, which can significantly speed up irregular memory-bound latency-insensitive applications. In addition, we extend nonblocking caches to generate variable-length bursts to memory, which increases the bandwidth delivered by DRAMs and their controllers. The resulting miss-optimized memory system provides up to 25% speedup with 24× area reduction on 15 large sparse matrix-vector multiplication benchmarks evaluated on an embedded and a datacenter FPGA system.

Download Full-text

A Script-Based Cycle-True Verification Framework to Speed-Up Hardware and Software Co-Design of System-on-Chip exploiting RISC-V Architecture

2021 16th International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS) ◽

10.1109/dtis53253.2021.9505139 ◽

2021 ◽

Author(s):

Luca Zulberti ◽

Pietro Nannipieri ◽

Luca Fanucci

Keyword(s):

System On Chip ◽

Speed Up ◽

On Chip

Download Full-text

Implementation of FFT on General-Purpose Architectures for FPGA

Computer Engineering ◽

10.4018/978-1-61350-456-7.ch310 ◽

2012 ◽

pp. 658-676

Author(s):

Fabio Garzia ◽

Roberto Airoldi ◽

Jari Nurmi

Keyword(s):

General Purpose ◽

Reference Architecture ◽

Processor Core ◽

General Purpose Processor ◽

Programmable Architecture ◽

Reconfigurable Array ◽

Field Programmable ◽

Speed Up ◽

On Chip ◽

High Level

This paper describes two general-purpose architectures targeted to Field Programmable Gate Array (FPGA) implementation. The first architecture is based on the coupling of a coarse-grain reconfigurable array with a general-purpose processor core. The second architecture is a homogeneous multi-processor system-on-chip (MP-SoC). Both architectures have been mapped onto two different Altera FPGA devices, a StratixII and a StratixIV. Although mapping onto the StratixIV results in higher operating frequencies, the capabilities of the device are not fully exploited. The implementation of a FFT on the two platforms shows a considerable speed-up in comparison with a single-processor reference architecture. The speed-up is higher in the reconfigurable solution but the MP-SoC provides an easier programming interface that is completely based on C language. The authors’ approach proves that implementing a programmable architecture on FPGA and then programming it using a high-level software language is a viable alternative to designing a dedicated hardware block with a hardware description language (HDL) and mapping it on FPGA.

Download Full-text

FPGA-Based Processor Acceleration for Image Processing Applications

Journal of Imaging ◽

10.3390/jimaging5010016 ◽

2019 ◽

Vol 5 (1) ◽

pp. 16 ◽

Cited By ~ 5

Author(s):

Fahad Siddiqui ◽

Sam Amiri ◽

Umar Minhas ◽

Tiantai Deng ◽

Roger Woods ◽

...

Keyword(s):

Image Processing ◽

Software Systems ◽

Traffic Sign Recognition ◽

Traffic Sign ◽

Power Efficient ◽

Sign Recognition ◽

Filter Operation ◽

Speed Up ◽

On Chip ◽

Embedded Image

FPGA-based embedded image processing systems offer considerable computing resources but present programming challenges when compared to software systems. The paper describes an approach based on an FPGA-based soft processor called Image Processing Processor (IPPro) which can operate up to 337 MHz on a high-end Xilinx FPGA family and gives details of the dataflow-based programming environment. The approach is demonstrated for a k-means clustering operation and a traffic sign recognition application, both of which have been prototyped on an Avnet Zedboard that has Xilinx Zynq-7000 system-on-chip (SoC). A number of parallel dataflow mapping options were explored giving a speed-up of 8 times for the k-means clustering using 16 IPPro cores, and a speed-up of 9.6 times for the morphology filter operation of the traffic sign recognition using 16 IPPro cores compared to their equivalent ARM-based software implementations. We show that for k-means clustering, the 16 IPPro cores implementation is 57, 28 and 1.7 times more power efficient (fps/W) than ARM Cortex-A7 CPU, nVIDIA GeForce GTX980 GPU and ARM Mali-T628 embedded GPU respectively.

Download Full-text

An Efficient FPGA Implementation of Richardson-Lucy Deconvolution Algorithm for Hyperspectral Images

Electronics ◽

10.3390/electronics10040504 ◽

2021 ◽

Vol 10 (4) ◽

pp. 504

Author(s):

Karine Avagian ◽

Milica Orlandić

Keyword(s):

State Of The Art ◽

Hyperspectral Images ◽

Image Size ◽

Spectral Bands ◽

Deconvolution Algorithm ◽

Spread Function ◽

Field Programmable ◽

Speed Up ◽

On Chip ◽

The Individual

This paper proposes an implementation of a Richardson-Lucy (RL) deconvolution method to reduce the spatial degradation in hyperspectral images during the image acquisition process. The degradation, modeled by convolution with a point spread function (PSF), is reduced by applying both standard and accelerated RLdeconvolution algorithms on the individual images in spectral bands. Boundary conditions are introduced to maintain a constant image size without distorting the estimated image boundaries. The RL deconvolution algorithm is implemented on a field-programmable gate array (FPGA)-based Xilinx Zynq-7020 System-on-Chip (SoC). The proposed architecture is parameterized with respect to the image size and configurable with respect to the algorithm variant, the number of iterations, and the kernel size by setting the dedicated configuration registers. A speed-up by factors of 61 and 21 are reported compared to software-only and FPGA-based state-of-the-art implementations, respectively.

Download Full-text

Automated processing of NGS data from raw sequencing files to ready-to-use information tables for genome modeling

Genomics and Computational Biology ◽

10.18547/gcb.2018.vol4.iss2.e100042 ◽

2018 ◽

Vol 4 (2) ◽

pp. 100042

Author(s):

Robert Deelen ◽

Martin Wieland ◽

Susanne Gerber ◽

David Fournier

Keyword(s):

Regulation Of Gene Expression ◽

Sequencing Data ◽

Processing Power ◽

Automated Processing ◽

Speed Up ◽

Data Files ◽

On Chip ◽

Next Generation Sequencing Ngs ◽

User Friendly ◽

Ngs Data

Epigenetic features such as histone and DNA modifications are important mechanisms for the regulation of gene expression and for cell and tissue development. As a result, extensive efforts are currently undertaken using next-generation sequencing (NGS) to generate vast amounts of data regarding the epigenetic regulation of genomes. Several tools and frameworks for the processing of these NGS data have been developed in the last decade. Nevertheless, each user still bares the challenge to integrate all these tasks to perform the analysis. This procedure is not only tedious but also resource-intensive due to the putative large processing power involved. To automate, standardize and speed up the handling of NGS data, with focus on ChIP-seq data, we present a user-friendly pipeline that automatically processes a list of sequencing data files and returns a ready-to-use purified table for subsequent modelling or analysis attempts.

Download Full-text

Implementation of FFT on General-Purpose Architectures for FPGA

Innovations in Embedded and Real-Time Systems Engineering for Communication ◽

10.4018/978-1-4666-0912-9.ch009 ◽

2012 ◽

pp. 156-175

Author(s):

Fabio Garzia ◽

Roberto Airoldi ◽

Jari Nurmi

Keyword(s):

General Purpose ◽

Reference Architecture ◽

Processor Core ◽

General Purpose Processor ◽

Programmable Architecture ◽

Reconfigurable Array ◽

Field Programmable ◽

Speed Up ◽

On Chip ◽

High Level

This paper describes two general-purpose architectures targeted to Field Programmable Gate Array (FPGA) implementation. The first architecture is based on the coupling of a coarse-grain reconfigurable array with a general-purpose processor core. The second architecture is a homogeneous multi-processor system-on-chip (MP-SoC). Both architectures have been mapped onto two different Altera FPGA devices, a StratixII and a StratixIV. Although mapping onto the StratixIV results in higher operating frequencies, the capabilities of the device are not fully exploited. The implementation of a FFT on the two platforms shows a considerable speed-up in comparison with a single-processor reference architecture. The speed-up is higher in the reconfigurable solution but the MP-SoC provides an easier programming interface that is completely based on C language. The authors’ approach proves that implementing a programmable architecture on FPGA and then programming it using a high-level software language is a viable alternative to designing a dedicated hardware block with a hardware description language (HDL) and mapping it on FPGA.

Download Full-text

Hardware/Software Co-Design of Fractal Features Based Fall Detection System

Sensors ◽

10.3390/s20082322 ◽

2020 ◽

Vol 20 (8) ◽

pp. 2322

Author(s):

Ahsen Tahir ◽

Gordon Morison ◽

Dawn A. Skelton ◽

Ryan M. Gibson

Keyword(s):

Embedded System ◽

Detection System ◽

Fall Detection ◽

Detection Accuracy ◽

Linear Discriminant ◽

Detection Systems ◽

Performance Per Watt ◽

Speed Up ◽

On Chip

Falls are a leading cause of death in older adults and result in high levels of mortality, morbidity and immobility. Fall Detection Systems (FDS) are imperative for timely medical aid and have been known to reduce death rate by 80%. We propose a novel wearable sensor FDS which exploits fractal dynamics of fall accelerometer signals. Fractal dynamics can be used as an irregularity measure of signals and our work shows that it is a key discriminant for classification of falls from other activities of life. We design, implement and evaluate a hardware feature accelerator for computation of fractal features through multi-level wavelet transform on a reconfigurable embedded System on Chip, Zynq device for evaluating wearable accelerometer sensors. The proposed FDS utilises a hardware/software co-design approach with hardware accelerator for fractal features and software implementation of Linear Discriminant Analysis on an embedded ARM core for high accuracy and energy efficiency. The proposed system achieves 99.38% fall detection accuracy, 7.3× speed-up and 6.53× improvements in power consumption, compared to the software only execution with an overall performance per Watt advantage of 47.6×, while consuming low reconfigurable resources at 28.67%.

Download Full-text