scholarly journals Overlapping Communication and Computation with OpenMP and MPI

2001 ◽  
Vol 9 (2-3) ◽  
pp. 73-81 ◽  
Author(s):  
Timothy H. Kaiser ◽  
Scott B. Baden

Machines comprised of a distributed collection of shared memory or SMP nodes are becoming common for parallel computing. OpenMP can be combined with MPI on many such machines. Motivations for combing OpenMP and MPI are discussed. While OpenMP is typically used for exploiting loop-level parallelism it can also be used to enable coarse grain parallelism, potentially leading to less overhead. We show how coarse grain OpenMP parallelism can also be used to facilitate overlapping MPI communication and computation for stencil-based grid programs such as a program performing Gauss-Seidel iteration with red-black ordering. Spatial subdivision or domain decomposition is used to assign a portion of the grid to each thread. One thread is assigned a null calculation region so it was free to perform communication. Example calculations were run on an IBM SP using both the Kuck & Associates and IBM compilers.

Author(s):  
Yasuhito Takahashi ◽  
Koji Fujiwara ◽  
Takeshi Iwashita ◽  
Hiroshi Nakashima

Purpose This paper aims to propose a parallel-in-space-time finite-element method (FEM) for transient motor starting analyses. Although the domain decomposition method (DDM) is suitable for solving large-scale problems and the parallel-in-time (PinT) integration method such as Parareal and time domain parallel FEM (TDPFEM) is effective for problems with a large number of time steps, their parallel performances get saturated as the number of processes increases. To overcome the difficulty, the hybrid approach in which both the DDM and PinT integration methods are used is investigated in a highly parallel computing environment. Design/methodology/approach First, the parallel performances of the DDM, Parareal and TDPFEM were compared because the scalability of these methods in highly parallel computation has not been deeply discussed. Then, the combination of the DDM and Parareal was investigated as a parallel-in-space-time FEM. The effectiveness of the developed method was demonstrated in transient starting analyses of induction motors. Findings The combination of Parareal with the DDM can improve the parallel performance in the case where the parallel performance of the DDM, TDPFEM or Parareal is saturated in highly parallel computation. In the case where the number of unknowns is large and the number of available processes is limited, the use of DDM is the most effective from the standpoint of computational cost. Originality/value This paper newly develops the parallel-in-space-time FEM and demonstrates its effectiveness in nonlinear magnetoquasistatic field analyses of electric machines. This finding is significantly important because a new direction of parallel computing techniques and great potential for its further development are clarified.


2005 ◽  
Vol 18 (2) ◽  
pp. 219-224
Author(s):  
Emina Milovanovic ◽  
Natalija Stojanovic

Because many universities lack the funds to purchase expensive parallel computers, cost effective alternatives are needed to teach students about parallel processing. Free software is available to support the three major paradigms of parallel computing. Parallaxis is a sophisticated SIMD simulator which runs on a variety of platforms.jBACI shared memory simulator supports the MIMD model of computing with a common shared memory. PVM and MPI allow students to treat a network of workstations as a message passing MIMD multicomputer with distributed memory. Each of this software tools can be used in a variety of courses to give students experience with parallel algorithms.


Algorithms ◽  
2021 ◽  
Vol 14 (12) ◽  
pp. 342
Author(s):  
Alessandro Varsi ◽  
Simon Maskell ◽  
Paul G. Spirakis

Resampling is a well-known statistical algorithm that is commonly applied in the context of Particle Filters (PFs) in order to perform state estimation for non-linear non-Gaussian dynamic models. As the models become more complex and accurate, the run-time of PF applications becomes increasingly slow. Parallel computing can help to address this. However, resampling (and, hence, PFs as well) necessarily involves a bottleneck, the redistribution step, which is notoriously challenging to parallelize if using textbook parallel computing techniques. A state-of-the-art redistribution takes O((log2N)2) computations on Distributed Memory (DM) architectures, which most supercomputers adopt, whereas redistribution can be performed in O(log2N) on Shared Memory (SM) architectures, such as GPU or mainstream CPUs. In this paper, we propose a novel parallel redistribution for DM that achieves an O(log2N) time complexity. We also present empirical results that indicate that our novel approach outperforms the O((log2N)2) approach.


2001 ◽  
Vol 13 (8-9) ◽  
pp. 663-680 ◽  
Author(s):  
José Oliver ◽  
Jordi Guitart ◽  
Eduard Ayguadé ◽  
Nacho Navarro ◽  
Jordi Torres
Keyword(s):  

Author(s):  
Ying Yi ◽  
Wei Han ◽  
Adam Major ◽  
Ahmet T. Erdogan ◽  
Tughrul Arslan

Sign in / Sign up

Export Citation Format

Share Document