An Automatic Design Flow for Data Parallel and Pipelined Signal Processing Applications on Embedded Multiprocessor with NoC: Application to Cryptography

International Journal of Reconfigurable Computing ◽

10.1155/2009/631490 ◽

2009 ◽

Vol 2009 ◽

pp. 1-14 ◽

Cited By ~ 5

Author(s):

Xinyu Li ◽

Omar Hammami

Keyword(s):

Signal Processing ◽

Embedded System ◽

High Performance ◽

Chip Multiprocessors ◽

Parallel Implementation ◽

Data Encryption ◽

Design Flow ◽

Automatic Design ◽

Single Chip ◽

Data Parallel

Embedded system design is increasingly based on single chip multiprocessors because of the high performance and flexibility requirements. Embedded multiprocessors on FPGA provide the additional flexibility by allowing customization through addition of hardware accelerators on FPGA when parallel software implementation does not provide the expected performance. And the overall multiprocessor architecture is still kept for additional applications. This provides a transition to software only parallel implementation while avoiding pure hardware implementation. An automatic design flow is proposed well suited for data flow signal processing exhibiting both pipelining and data parallel mode of execution. Fork-Join model-based software parallelization is explored to find out the best parallelization configuration. C-based synthesis coprocessor is added to improve performance with more hardware resource usage. The Triple Data Encryption Standard (TDES) cryptographic algorithm on a 48-PE single-chip distributed memory multiprocessor is selected as an application example of the flow.

Download Full-text

Discrete- vs. Continuous-Time Nonlinear Signal Processing: Attractors, Transitions and Parallel Implementation Issues

1993 American Control Conference ◽

10.23919/acc.1993.4793116 ◽

1993 ◽

Cited By ~ 2

Author(s):

R. Rico-Martines ◽

I. G. Kevrekidis ◽

M. C. Kube ◽

J. L. Hudson

Keyword(s):

Signal Processing ◽

Continuous Time ◽

Parallel Implementation ◽

Nonlinear Signal Processing ◽

Nonlinear Signal

Download Full-text

Matlab and Parallel Computing

Image Processing & Communications ◽

10.2478/v10248-012-0048-5 ◽

2012 ◽

Vol 17 (4) ◽

pp. 207-216 ◽

Cited By ~ 5

Author(s):

Magdalena Szymczyk ◽

Piotr Szymczyk

Keyword(s):

Image Processing ◽

Signal Processing ◽

Parallel Computing ◽

Distributed Computing ◽

Control Systems ◽

High Performance ◽

Parallel Applications ◽

Process Simulations ◽

Key Features ◽

Financial Process

Abstract The MATLAB is a technical computing language used in a variety of fields, such as control systems, image and signal processing, visualization, financial process simulations in an easy-to-use environment. MATLAB offers "toolboxes" which are specialized libraries for variety scientific domains, and a simplified interface to high-performance libraries (LAPACK, BLAS, FFTW too). Now MATLAB is enriched by the possibility of parallel computing with the Parallel Computing ToolboxTM and MATLAB Distributed Computing ServerTM. In this article we present some of the key features of MATLAB parallel applications focused on using GPU processors for image processing.

Download Full-text

Structure, Models and Algorithms for Signal Processing of a High-Performance Multichannel Measuring System

Физические основы приборостроения ◽

10.25210/jfop-1702-076079 ◽

2017 ◽

Vol 6 (2) ◽

pp. 76-79

Author(s):

A.A. Baryshnikov ◽

◽

V.I. Kuzmin ◽

D.L. Tytik ◽

◽

...

Keyword(s):

Signal Processing ◽

High Performance ◽

Measuring System ◽

Multichannel Measuring

Download Full-text

DSPSR: Digital Signal Processing Software for Pulsar Astronomy

Publications of the Astronomical Society of Australia ◽

10.1071/as10021 ◽

2011 ◽

Vol 28 (1) ◽

pp. 1-14 ◽

Cited By ~ 172

Author(s):

W. van Straten ◽

M. Bailes

Keyword(s):

Signal Processing ◽

Digital Signal Processing ◽

Graphics Processing Units ◽

High Performance ◽

Digital Signal ◽

General Purpose ◽

Design Decisions ◽

Extensive Range ◽

Processing Software ◽

Graphics Processing

Abstractdspsr is a high-performance, open-source, object-oriented, digital signal processing software library and application suite for use in radio pulsar astronomy. Written primarily in C++, the library implements an extensive range of modular algorithms that can optionally exploit both multiple-core processors and general-purpose graphics processing units. After over a decade of research and development, dspsr is now stable and in widespread use in the community. This paper presents a detailed description of its functionality, justification of major design decisions, analysis of phase-coherent dispersion removal algorithms, and demonstration of performance on some contemporary microprocessor architectures.

Download Full-text

High-performance, high-capacity single-chip microcomputers

Proceedings of the June 7-10, 1982, national computer conference on - AFIPS '82 ◽

10.1145/1500774.1500783 ◽

1982 ◽

Author(s):

ED Peatrowsky

Keyword(s):

High Performance ◽

High Capacity ◽

Single Chip

Download Full-text

A secure data parallel processing based embedded system for internet of things computer vision using field programmable gate array devices

International Journal of Circuit Theory and Applications ◽

10.1002/cta.2964 ◽

2021 ◽

Author(s):

Kashif Naseer Qureshi ◽

Sundus Qayyum ◽

Muhammad Najam Ul Islam ◽

Gwanggil Jeon

Keyword(s):

Computer Vision ◽

Parallel Processing ◽

Internet Of Things ◽

Embedded System ◽

Field Programmable Gate Array ◽

Data Parallel ◽

Secure Data ◽

Field Programmable ◽

Gate Array

Download Full-text

High-Level Parallel Ant Colony Optimization with Algorithmic Skeletons

International Journal of Parallel Programming ◽

10.1007/s10766-021-00714-1 ◽

2021 ◽

Author(s):

Breno A. de Melo Menezes ◽

Nina Herrmann ◽

Herbert Kuchen ◽

Fernando Buarque de Lima Neto

Keyword(s):

Ant Colony Optimization ◽

High Performance ◽

Optimization Problems ◽

Programming Model ◽

Parallel Implementation ◽

Ant Colony ◽

Algorithmic Skeletons ◽

Low Level ◽

Programming Patterns ◽

High Level

AbstractParallel implementations of swarm intelligence algorithms such as the ant colony optimization (ACO) have been widely used to shorten the execution time when solving complex optimization problems. When aiming for a GPU environment, developing efficient parallel versions of such algorithms using CUDA can be a difficult and error-prone task even for experienced programmers. To overcome this issue, the parallel programming model of Algorithmic Skeletons simplifies parallel programs by abstracting from low-level features. This is realized by defining common programming patterns (e.g. map, fold and zip) that later on will be converted to efficient parallel code. In this paper, we show how algorithmic skeletons formulated in the domain specific language Musket can cope with the development of a parallel implementation of ACO and how that compares to a low-level implementation. Our experimental results show that Musket suits the development of ACO. Besides making it easier for the programmer to deal with the parallelization aspects, Musket generates high performance code with similar execution times when compared to low-level implementations.

Download Full-text

A Modified KNN Algorithm for High-Performance Computing on FPGA of Real-Time m-QAM Demodulators

Electronics ◽

10.3390/electronics10050627 ◽

2021 ◽

Vol 10 (5) ◽

pp. 627

Author(s):

David Marquez-Viloria ◽

Luis Castano-Londono ◽

Neil Guerrero-Gonzalez

Keyword(s):

Real Time ◽

High Performance ◽

Interference Mitigation ◽

Parallel Implementation ◽

Computational Time ◽

Successful Implementation ◽

Interchannel Interference ◽

The Difference ◽

High Level ◽

Performance Computing

A methodology for scalable and concurrent real-time implementation of highly recurrent algorithms is presented and experimentally validated using the AWS-FPGA. This paper presents a parallel implementation of a KNN algorithm focused on the m-QAM demodulators using high-level synthesis for fast prototyping, parameterization, and scalability of the design. The proposed design shows the successful implementation of the KNN algorithm for interchannel interference mitigation in a 3 × 16 Gbaud 16-QAM Nyquist WDM system. Additionally, we present a modified version of the KNN algorithm in which comparisons among data symbols are reduced by identifying the closest neighbor using the rule of the 8-connected clusters used for image processing. Real-time implementation of the modified KNN on a Xilinx Virtex UltraScale+ VU9P AWS-FPGA board was compared with the results obtained in previous work using the same data from the same experimental setup but offline DSP using Matlab. The results show that the difference is negligible below FEC limit. Additionally, the modified KNN shows a reduction of operations from 43 percent to 75 percent, depending on the symbol’s position in the constellation, achieving a reduction 47.25% reduction in total computational time for 100 K input symbols processed on 20 parallel cores compared to the KNN algorithm.

Download Full-text

High-Performance Parallel Implementation of Genetic Algorithm on FPGA

Circuits Systems and Signal Processing ◽

10.1007/s00034-019-01037-w ◽

2019 ◽

Vol 38 (9) ◽

pp. 4014-4039 ◽

Cited By ~ 7

Author(s):

Matheus F. Torquato ◽

Marcelo A. C. Fernandes

Keyword(s):

Genetic Algorithm ◽

High Performance ◽

Parallel Implementation

Download Full-text

Advanced and Simplified Signal Processing System for VTR and Its High Performance LSI'S

IEEE Transactions on Consumer Electronics ◽

10.1109/tce.1978.267050 ◽

1978 ◽

Vol CE-24 (3) ◽

pp. 458-467 ◽

Cited By ~ 4

Author(s):

Akira Shibata ◽

Toshi Itoh ◽

Isao Nakagawa

Keyword(s):

Signal Processing ◽

High Performance ◽

Processing System ◽

Signal Processing System

Download Full-text