A data-parallel implementation of O(N) hierarchical N-body methods

A data parallel implementation of the TRFD program from the Perfect benchmarks

Massively Parallel Processing Applications and Development ◽

10.1016/b978-0-444-81784-6.50046-4 ◽

1994 ◽

pp. 355-362 ◽

Cited By ~ 2

Author(s):

David J. Lilja ◽

Jonathan Schmitt

Keyword(s):

Parallel Implementation ◽

Data Parallel

Download Full-text

AN EVALUATION OF MULTIPLE FEED-FORWARD NETWORKS ON GPUs

International Journal of Neural Systems ◽

10.1142/s0129065711002638 ◽

2011 ◽

Vol 21 (01) ◽

pp. 31-47 ◽

Cited By ~ 14

Author(s):

NOEL LOPES ◽

BERNARDETE RIBEIRO

Keyword(s):

Graphics Processing Unit ◽

Parallel Implementation ◽

Low Cost ◽

Back Propagation ◽

General Purpose ◽

Training System ◽

Graphics Hardware ◽

Processing Unit ◽

Data Parallel ◽

Graphics Processing

The Graphics Processing Unit (GPU) originally designed for rendering graphics and which is difficult to program for other tasks, has since evolved into a device suitable for general-purpose computations. As a result graphics hardware has become progressively more attractive yielding unprecedented performance at a relatively low cost. Thus, it is the ideal candidate to accelerate a wide variety of data parallel tasks in many fields such as in Machine Learning (ML). As problems become more and more demanding, parallel implementations of learning algorithms are crucial for a useful application. In particular, the implementation of Neural Networks (NNs) in GPUs can significantly reduce the long training times during the learning process. In this paper we present a GPU parallel implementation of the Back-Propagation (BP) and Multiple Back-Propagation (MBP) algorithms, and describe the GPU kernels needed for this task. The results obtained on well-known benchmarks show faster training times and improved performances as compared to the implementation in traditional hardware, due to maximized floating-point throughput and memory bandwidth. Moreover, a preliminary GPU based Autonomous Training System (ATS) is developed which aims at automatically finding high-quality NNs-based solutions for a given problem.

Download Full-text

A Data Parallel Implementation Scheme of Geometric Operations

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.519-520.719 ◽

2014 ◽

Vol 519-520 ◽

pp. 719-723

Author(s):

Guang Wang

Keyword(s):

Image Processing ◽

Real Time ◽

Digital Image Processing ◽

Digital Image ◽

Parallel Implementation ◽

Computation Complexity ◽

The Real ◽

Data Parallel ◽

Implementation Scheme ◽

Time Requirements

A data parallel implementation of geometric operations is proposed and conclusions are proved. It shows that the computation complexity of data parallel implementation scheme presented in this paper is Ο(M+N). It can be used to improve the efficiency of geometric operations and can easily meet the real time requirements of the digital image processing.

Download Full-text

A Data-Parallel Implementation of Hierarchical N-Body Methods

The International Journal of Supercomputer Applications and High Performance Computing ◽

10.1177/109434209601000101 ◽

1996 ◽

Vol 10 (1) ◽

pp. 3-40 ◽

Cited By ~ 3

Author(s):

Yu Hu ◽

S. Lennart Johnsson

Keyword(s):

Parallel Implementation ◽

Data Parallel

Download Full-text

Data parallel implementation of extensible sparse functional arrays

Lecture Notes in Computer Science - PARLE '93 Parallel Architectures and Languages Europe ◽

10.1007/3-540-56891-3_6 ◽

1993 ◽

pp. 68-79 ◽

Cited By ~ 2

Author(s):

John T. O'Donnell

Keyword(s):

Parallel Implementation ◽

Data Parallel

Download Full-text

Data-parallel implementation of reconfigurable digital predistortion on a mobile GPU

2015 49th Asilomar Conference on Signals, Systems and Computers ◽

10.1109/acssc.2015.7421110 ◽

2015 ◽

Cited By ~ 2

Author(s):

Amanullah Ghazi ◽

Jani Boutellier ◽

Lauri Anttila ◽

Markku Juntti ◽

Mikko Valkama

Keyword(s):

Parallel Implementation ◽

Digital Predistortion ◽

Data Parallel ◽

Mobile Gpu

Download Full-text

A data parallel implementation of an edge point chaining: towards a new principle of edge linking

Proceedings of 3rd IEEE International Conference on Image Processing ◽

10.1109/icip.1996.560622 ◽

2002 ◽

Author(s):

P. Bonnin ◽

B. Hoeltzener-Douarin ◽

E. Pissaloux

Keyword(s):

Parallel Implementation ◽

Edge Point ◽

Data Parallel ◽

Edge Linking

Download Full-text

Data parallel implementation methods of a stiff chemical non-equilibrium flow solver: parallelization, vectorization and I/O

Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region ◽

10.1109/hpc.2000.843599 ◽

2000 ◽

Author(s):

Shijun Diao ◽

T. Fujiwara

Keyword(s):

Parallel Implementation ◽

Equilibrium Flow ◽

Flow Solver ◽

Data Parallel ◽

Non Equilibrium

Download Full-text

An Efficient Parallel Implementation of CPU Scheduling Algorithms Using Data Parallel Algorithms

International Conference on Advanced Computing Networking and Informatics - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-13-2673-8_45 ◽

2018 ◽

pp. 429-438

Author(s):

Suvigya Agrawal ◽

Aishwarya Yadav ◽

Disha Parwani ◽

Veena Mayya

Keyword(s):

Parallel Algorithms ◽

Parallel Implementation ◽

Scheduling Algorithms ◽

Cpu Scheduling ◽

Data Parallel ◽

Using Data

Download Full-text

An Automatic Design Flow for Data Parallel and Pipelined Signal Processing Applications on Embedded Multiprocessor with NoC: Application to Cryptography

International Journal of Reconfigurable Computing ◽

10.1155/2009/631490 ◽

2009 ◽

Vol 2009 ◽

pp. 1-14 ◽

Cited By ~ 5

Author(s):

Xinyu Li ◽

Omar Hammami

Keyword(s):

Signal Processing ◽

Embedded System ◽

High Performance ◽

Chip Multiprocessors ◽

Parallel Implementation ◽

Data Encryption ◽

Design Flow ◽

Automatic Design ◽

Single Chip ◽

Data Parallel

Embedded system design is increasingly based on single chip multiprocessors because of the high performance and flexibility requirements. Embedded multiprocessors on FPGA provide the additional flexibility by allowing customization through addition of hardware accelerators on FPGA when parallel software implementation does not provide the expected performance. And the overall multiprocessor architecture is still kept for additional applications. This provides a transition to software only parallel implementation while avoiding pure hardware implementation. An automatic design flow is proposed well suited for data flow signal processing exhibiting both pipelining and data parallel mode of execution. Fork-Join model-based software parallelization is explored to find out the best parallelization configuration. C-based synthesis coprocessor is added to improve performance with more hardware resource usage. The Triple Data Encryption Standard (TDES) cryptographic algorithm on a 48-PE single-chip distributed memory multiprocessor is selected as an application example of the flow.

Download Full-text