Overlapping Communication and Computation with OpenMP and MPI

Timothy H. Kaiser; Scott B. Baden

doi:10.1155/2001/712152

Overlapping Communication and Computation with OpenMP and MPI

Scientific Programming ◽

10.1155/2001/712152 ◽

2001 ◽

Vol 9 (2-3) ◽

pp. 73-81 ◽

Cited By ~ 4

Author(s):

Timothy H. Kaiser ◽

Scott B. Baden

Keyword(s):

Parallel Computing ◽

Domain Decomposition ◽

Shared Memory ◽

Coarse Grain ◽

Calculation Region ◽

Loop Level ◽

Spatial Subdivision ◽

Level Parallelism

Machines comprised of a distributed collection of shared memory or SMP nodes are becoming common for parallel computing. OpenMP can be combined with MPI on many such machines. Motivations for combing OpenMP and MPI are discussed. While OpenMP is typically used for exploiting loop-level parallelism it can also be used to enable coarse grain parallelism, potentially leading to less overhead. We show how coarse grain OpenMP parallelism can also be used to facilitate overlapping MPI communication and computation for stencil-based grid programs such as a program performing Gauss-Seidel iteration with red-black ordering. Spatial subdivision or domain decomposition is used to assign a portion of the grid to each thread. One thread is assigned a null calculation region so it was free to perform communication. Example calculations were run on an IBM SP using both the Kuck & Associates and IBM compilers.

Download Full-text

Domain decomposition with discrete element simulations using shared-memory parallel computing for railways applications

European Journal of Computational Mechanics ◽

10.1080/17797179.2012.714723 ◽

2012 ◽

Vol 21 (3-6) ◽

pp. 242-253 ◽

Cited By ~ 3

Author(s):

T.M.P. Hoang ◽

G. Saussine ◽

D. Dureisseix ◽

P. Alart

Keyword(s):

Parallel Computing ◽

Domain Decomposition ◽

Shared Memory ◽

Discrete Element ◽

Discrete Element Simulations

Download Full-text

Towards Detection of Coarse-Grain Loop-Level Parallelism in Irregular Computations

Euro-Par 2002 Parallel Processing - Lecture Notes in Computer Science ◽

10.1007/3-540-45706-2_38 ◽

2002 ◽

pp. 289-298

Author(s):

Manuel Arenaz ◽

Juan Touriño ◽

Ramón Doallo

Keyword(s):

Coarse Grain ◽

Loop Level ◽

Level Parallelism ◽

Irregular Computations

Download Full-text

Parallel finite-element method using domain decomposition and Parareal for transient motor starting analysis

COMPEL The International Journal for Computation and Mathematics in Electrical and Electronic Engineering ◽

10.1108/compel-12-2018-0516 ◽

2019 ◽

Vol 38 (5) ◽

pp. 1507-1520 ◽

Cited By ~ 1

Author(s):

Yasuhito Takahashi ◽

Koji Fujiwara ◽

Takeshi Iwashita ◽

Hiroshi Nakashima

Keyword(s):

Finite Element Method ◽

Finite Element ◽

Parallel Computing ◽

Domain Decomposition ◽

Parallel Computation ◽

Domain Decomposition Method ◽

Space Time ◽

Content Type ◽

Parallel Performance ◽

Element Method

Purpose This paper aims to propose a parallel-in-space-time finite-element method (FEM) for transient motor starting analyses. Although the domain decomposition method (DDM) is suitable for solving large-scale problems and the parallel-in-time (PinT) integration method such as Parareal and time domain parallel FEM (TDPFEM) is effective for problems with a large number of time steps, their parallel performances get saturated as the number of processes increases. To overcome the difficulty, the hybrid approach in which both the DDM and PinT integration methods are used is investigated in a highly parallel computing environment. Design/methodology/approach First, the parallel performances of the DDM, Parareal and TDPFEM were compared because the scalability of these methods in highly parallel computation has not been deeply discussed. Then, the combination of the DDM and Parareal was investigated as a parallel-in-space-time FEM. The effectiveness of the developed method was demonstrated in transient starting analyses of induction motors. Findings The combination of Parareal with the DDM can improve the parallel performance in the case where the parallel performance of the DDM, TDPFEM or Parareal is saturated in highly parallel computation. In the case where the number of unknowns is large and the number of available processes is limited, the use of DDM is the most effective from the standpoint of computational cost. Originality/value This paper newly develops the parallel-in-space-time FEM and demonstrates its effectiveness in nonlinear magnetoquasistatic field analyses of electric machines. This finding is significantly important because a new direction of parallel computing techniques and great potential for its further development are clarified.

Download Full-text

Fine-Tuning Loop-Level Parallelism for Increasing Performance of DSP Applications on FPGAs

12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines ◽

10.1109/fccm.2004.32 ◽

2004 ◽

Author(s):

E. Ozre ◽

A.P. Nisbet ◽

D. Gregg

Keyword(s):

Fine Tuning ◽

Loop Level ◽

Level Parallelism ◽

Dsp Applications

Download Full-text

Parallel computing for finite element structural analysis using conjugate gradient method based on domain decomposition

Journal of Shanghai University (English Edition) ◽

10.1007/s11741-006-0049-y ◽

2006 ◽

Vol 10 (6) ◽

pp. 517-521

Author(s):

Chao-jiang Fu ◽

Wu Zhang

Keyword(s):

Finite Element ◽

Parallel Computing ◽

Structural Analysis ◽

Domain Decomposition ◽

Conjugate Gradient Method ◽

Conjugate Gradient ◽

Gradient Method

Download Full-text

Teaching tools for parallel processing

Facta universitatis - series Electronics and Energetics ◽

10.2298/fuee0502219m ◽

2005 ◽

Vol 18 (2) ◽

pp. 219-224

Author(s):

Emina Milovanovic ◽

Natalija Stojanovic

Keyword(s):

Parallel Computing ◽

Parallel Processing ◽

Shared Memory ◽

Message Passing ◽

Distributed Memory ◽

Cost Effective ◽

Parallel Computers ◽

Free Software ◽

Teaching Tools ◽

Network Of Workstations

Because many universities lack the funds to purchase expensive parallel computers, cost effective alternatives are needed to teach students about parallel processing. Free software is available to support the three major paradigms of parallel computing. Parallaxis is a sophisticated SIMD simulator which runs on a variety of platforms.jBACI shared memory simulator supports the MIMD model of computing with a common shared memory. PVM and MPI allow students to treat a network of workstations as a message passing MIMD multicomputer with distributed memory. Each of this software tools can be used in a variety of courses to give students experience with parallel algorithms.

Download Full-text

An O(log2N) Fully-Balanced Resampling Algorithm for Particle Filters on Distributed Memory Architectures

Algorithms ◽

10.3390/a14120342 ◽

2021 ◽

Vol 14 (12) ◽

pp. 342

Author(s):

Alessandro Varsi ◽

Simon Maskell ◽

Paul G. Spirakis

Keyword(s):

Parallel Computing ◽

Shared Memory ◽

Time Complexity ◽

Distributed Memory ◽

Particle Filters ◽

Dynamic Models ◽

State Of The Art ◽

Novel Approach ◽

Non Gaussian ◽

Memory Architectures

Resampling is a well-known statistical algorithm that is commonly applied in the context of Particle Filters (PFs) in order to perform state estimation for non-linear non-Gaussian dynamic models. As the models become more complex and accurate, the run-time of PF applications becomes increasingly slow. Parallel computing can help to address this. However, resampling (and, hence, PFs as well) necessarily involves a bottleneck, the redistribution step, which is notoriously challenging to parallelize if using textbook parallel computing techniques. A state-of-the-art redistribution takes O((log2N)2) computations on Distributed Memory (DM) architectures, which most supercomputers adopt, whereas redistribution can be performed in O(log2N) on Shared Memory (SM) architectures, such as GPU or mainstream CPUs. In this paper, we propose a novel parallel redistribution for DM that achieves an O(log2N) time complexity. We also present empirical results that indicate that our novel approach outperforms the O((log2N)2) approach.

Download Full-text

Strategies for the efficient exploitation of loop-level parallelism in Java

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.573 ◽

2001 ◽

Vol 13 (8-9) ◽

pp. 663-680 ◽

Cited By ~ 2

Author(s):

José Oliver ◽

Jordi Guitart ◽

Eduard Ayguadé ◽

Nacho Navarro ◽

Jordi Torres

Keyword(s):

Loop Level ◽

Level Parallelism

Download Full-text

Exploiting loop-level parallelism on multi-core architectures for the wimax physical layer

2008 IEEE International SOC Conference ◽

10.1109/socc.2008.4641474 ◽

2008 ◽

Author(s):

Ying Yi ◽

Wei Han ◽

Adam Major ◽

Ahmet T. Erdogan ◽

Tughrul Arslan

Keyword(s):

Physical Layer ◽

Loop Level ◽

Level Parallelism

Download Full-text

Large-scale structural analysis using domain decomposition method on distributed parallel computing environment

Proceedings High Performance Computing on the Information Superhighway. HPC Asia '97 ◽

10.1109/hpc.1997.592211 ◽

2002 ◽

Author(s):

Seung Jo Kim ◽

Jeong Ho Kim

Keyword(s):

Parallel Computing ◽

Structural Analysis ◽

Domain Decomposition ◽

Decomposition Method ◽

Large Scale ◽

Domain Decomposition Method ◽

Computing Environment ◽

Distributed Parallel Computing

Download Full-text