Granular Dynamics Simulation on Multiple GPUs Using Domain Decomposition

Volume 2: 32nd Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2012-71121 ◽

2012 ◽

Author(s):

Hammad Mazhar ◽

Andrew Seidl ◽

Rebecca Shotwell ◽

Marco B. Quadrelli ◽

Dan Negrut ◽

...

Keyword(s):

Domain Decomposition ◽

Message Passing ◽

Message Passing Interface ◽

Dynamics Simulation ◽

Software Infrastructure ◽

Decomposition Approach ◽

Multiple Gpus ◽

Granular Dynamics ◽

Spatial Subdivision ◽

Multi Body

This paper describes the software infrastructure needed to enable massive multi-body simulation using multiple GPUs. Utilizing a domain decomposition approach, a large system made up of billions of bodies can be split into self-contained subdomains which are then transferred to different GPUs and solved in parallel. Parallelism is enabled on multiple levels, first on the CPU through OpenMP and secondly on the GPU through NVIDIA CUDA (Compute Unified Device Architecture). This heterogeneous software infrastructure can be extended to networks of computers using MPI (Message Passing Interface) as each subdomain is self-contained. This paper will discuss the implementation of the spatial subdivision algorithm used for subdomain creation along with the algorithms used for collision detection and constraint solution.

Download Full-text

Parallel Computation of Unsteady Incompressible Viscous Flows Using an Unstructured Multigrid Method

Fluids Engineering ◽

10.1115/imece2002-39388 ◽

2002 ◽

Cited By ~ 1

Author(s):

Yong Zhao ◽

Chin Hoe Tai

Keyword(s):

Unsteady Flow ◽

Message Passing ◽

Message Passing Interface ◽

Multigrid Method ◽

Numerical Solutions ◽

Viscous Flows ◽

Finite Volume Scheme ◽

Decomposition Approach ◽

Incompressible Viscous Flows ◽

Multigrid Solvers

The development and validation of a parallel unstructured non-nested multigrid method for simulation of unsteady incompressible viscous flow is presented. The Navier-Stokes solver is based on the artificial compressibility method (ACM) [10] and a higher-order characteristics-based finite-volume scheme [8] on unstructured multigrids. Unsteady flow is calculated with an implicit dual time stepping scheme. The parallelization of the solver is achieved by a multigrid domain decomposition approach (MG-DD), using the Single Program Multiple Data (SPMD) programming paradigm and Message-Passing Interface (MPI) for communication of data. The parallel codes using single grids and multigrids are used to simulate steady and unsteady incompressible viscous flows over a circular cylinder for validation and performance evaluation purposes. Speedups and parallel efficiencies obtained by both the parallel single-grid and multigrid solvers are reasonably good for both test cases, using up to 32 processors on the SGI Origin 2000. A maximum speedup of 12 could be achieved on 16 processors for the unsteady flow. The parallel results obtained agree well with those of serial solvers and with numerical solutions obtained by other researchers, as well as experimental measurements.

Download Full-text

Avaliação de Desempenho no Supercomputador SDumont de uma Estratégia de Decomposição de Domínio usando as Funcionalidades de Mapeamento Topológico do MPI para um Método Numérico de Escoamento de Fluidos.

10.5753/eradrj.2020.14513 ◽

2020 ◽

Author(s):

Stiw Herrera ◽

Weber Ribeiro ◽

Thiago Teixeira ◽

André Carneiro ◽

Frederico Cabral ◽

...

Keyword(s):

Domain Decomposition ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Oil And Gas ◽

Computational Cost ◽

Three Dimensional ◽

Data Allocation ◽

Performance Study ◽

Decomposition Strategies

Oil and gas simulations need new high-performance computing techniques to deal with the large amount of data allocation and the high computational cost that we obtain from the numerical method. The domain decomposition technique (domain division technique) was applied to a three-dimensional oil reservoir, where the MPI (Message Passing Interface) allowed the creation of a uni, bi and three-dimensional topology, where a subdivision of a reservoir could be solved in each MPI process created. A performance study was developed with these domain decomposition strategies in 20 computational nodes of the SDumont Supercomputer, using a Cascade Lake architecture.

Download Full-text

A Hybrid MPI–OpenMP Parallel Algorithm and Performance Analysis for an Ensemble Square Root Filter Designed for Multiscale Observations

Journal of Atmospheric and Oceanic Technology ◽

10.1175/jtech-d-12-00165.1 ◽

2013 ◽

Vol 30 (7) ◽

pp. 1382-1397 ◽

Cited By ~ 14

Author(s):

Yunheng Wang ◽

Youngsun Jung ◽

Timothy A. Supinie ◽

Ming Xue

Keyword(s):

Data Assimilation ◽

Domain Decomposition ◽

Parallel Algorithm ◽

Shared Memory ◽

Message Passing ◽

Message Passing Interface ◽

High Volume ◽

Square Root ◽

Fixed Amount ◽

Square Root Filter

Abstract A hybrid parallel scheme for the ensemble square root filter (EnSRF) suitable for parallel assimilation of multiscale observations, including those from dense observational networks such as those of radar, is developed based on the domain decomposition strategy. The scheme handles internode communication through a message passing interface (MPI) and the communication within shared-memory nodes via Open Multiprocessing (OpenMP) threads. It also supports pure MPI and pure OpenMP modes. The parallel framework can accommodate high-volume remote-sensed radar (or satellite) observations as well as conventional observations that usually have larger covariance localization radii. The performance of the parallel algorithm has been tested with simulated and real radar data. The parallel program shows good scalability in pure MPI and hybrid MPI–OpenMP modes, while pure OpenMP runs exhibit limited scalability on a symmetric shared-memory system. It is found that in MPI mode, better parallel performance is achieved with domain decomposition configurations in which the leading dimension of the state variable arrays is larger, because this configuration allows for more efficient memory access. Given a fixed amount of computing resources, the hybrid parallel mode is preferred to pure MPI mode on supercomputers with nodes containing shared-memory cores. The overall performance is also affected by factors such as the cache size, memory bandwidth, and the networking topology. Tests with a real data case with a large number of radars confirm that the parallel data assimilation can be done on a multicore supercomputer with a significant speedup compared to the serial data assimilation algorithm.

Download Full-text

Parallelization of an Implicit Algorithm for Multi-Dimensional Particle-in-Cell Simulations

Communications in Computational Physics ◽

10.4208/cicp.070813.280214a ◽

2014 ◽

Vol 16 (3) ◽

pp. 599-611 ◽

Cited By ~ 6

Author(s):

George M. Petrov ◽

Jack Davis

Keyword(s):

Domain Decomposition ◽

Message Passing ◽

Message Passing Interface ◽

Computation Time ◽

Three Dimensions ◽

Maximum Speed ◽

Particle In Cell ◽

Speed Up ◽

Ultrashort Pulse Lasers ◽

Mpi Implementation

AbstractThe implicit 2D3V particle-in-cell (PIC) code developed to study the interaction of ultrashort pulse lasers with matter [G. M. Petrov and J. Davis, Computer Phys. Comm. 179, 868 (2008); Phys. Plasmas 18, 073102 (2011)] has been parallelized using MPI (Message Passing Interface). The parallelization strategy is optimized for a small number of computer cores, up to about 64. Details on the algorithm implementation are given with emphasis on code optimization by overlapping computations with communications. Performance evaluation for 1D domain decomposition has been made on a small Linux cluster with 64 computer cores for two typical regimes of PIC operation: “particle dominated”, for which the bulk of the computation time is spent on pushing particles, and “field dominated”, for which computing the fields is prevalent. For a small number of computer cores, less than 32, the MPI implementation offers a significant numerical speed-up. In the “particle dominated” regime it is close to the maximum theoretical one, while in the “field dominated” regime it is about 75-80 % of the maximum speed-up. For a number of cores exceeding 32, performance degradation takes place as a result of the adopted 1D domain decomposition. The code parallelization will allow future implementation of atomic physics and extension to three dimensions.

Download Full-text

Dense granular dynamics analysis by a domain decomposition approach

Computational Mechanics ◽

10.1007/s00466-012-0699-5 ◽

2012 ◽

Vol 49 (6) ◽

pp. 709-723 ◽

Cited By ~ 15

Author(s):

V. Visseq ◽

A. Martin ◽

D. Iceta ◽

E. Azema ◽

D. Dureisseix ◽

...

Keyword(s):

Domain Decomposition ◽

Dynamics Analysis ◽

Decomposition Approach ◽

Granular Dynamics

Download Full-text

Analysis of a Reflectarray by Using an Iterative Domain Decomposition Technique

International Journal of Antennas and Propagation ◽

10.1155/2012/139146 ◽

2012 ◽

Vol 2012 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Carlos Delgado ◽

Josefa Gómez ◽

Abdelhamid Tayebi ◽

Iván González ◽

Felipe Cátedra

Keyword(s):

Domain Decomposition ◽

Efficient Method ◽

Message Passing ◽

Message Passing Interface ◽

Wave Method ◽

Feeding System ◽

Decomposition Technique ◽

Full Wave ◽

Time Requirements ◽

Computational Resources

We present an efficient method for the analysis of different objects that may contain a complex feeding system and a reflector structure. The approach is based on a domain decomposition technique that divides the geometry into several parts to minimize the vast computational resources required when applying a full wave method. This technique is also parallelized by using the Message Passing Interface to minimize the memory and time requirements of the simulation. A reflectarray analysis serves as an example of the proposed approach.

Download Full-text

Domain decomposition approach to flexible multibody dynamics simulation

Computational Mechanics ◽

10.1007/s00466-013-0898-8 ◽

2013 ◽

Vol 53 (1) ◽

pp. 147-158 ◽

Cited By ~ 5

Author(s):

JunYoung Kwak ◽

TaeYoung Chun ◽

SangJoon Shin ◽

Olivier A. Bauchau

Keyword(s):

Domain Decomposition ◽

Multibody Dynamics ◽

Flexible Multibody Dynamics ◽

Dynamics Simulation ◽

Decomposition Approach ◽

Flexible Multibody

Download Full-text

Multi-level Parallelization of Genotype Imputation on Supercomputers

Current Bioinformatics ◽

10.2174/1574893615999200420071307 ◽

2020 ◽

Vol 15 ◽

Author(s):

Weiwen Zhang ◽

Long Wang ◽

Theint Theint Aye ◽

Juniarto Samsudin ◽

Yongqing Zhu

Keyword(s):

Association Study ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Genome Wide Association Study ◽

Job Scheduling ◽

Genotype Imputation ◽

Job Level ◽

Multi Level ◽

High Performance Requirement

Background: Genotype imputation as a service is developed to enable researchers to estimate genotypes on haplotyped data without performing whole genome sequencing. However, genotype imputation is computation intensive and thus it remains a challenge to satisfy the high performance requirement of genome wide association study (GWAS). Objective: In this paper, we propose a high performance computing solution for genotype imputation on supercomputers to enhance its execution performance. Method: We design and implement a multi-level parallelization that includes job level, process level and thread level parallelization, enabled by job scheduling management, message passing interface (MPI) and OpenMP, respectively. It involves job distribution, chunk partition and execution, parallelized iteration for imputation and data concatenation. Due to the design of multi-level parallelization, we can exploit the multi-machine/multi-core architecture to improve the performance of genotype imputation. Results: Experiment results show that our proposed method can outperform the Hadoop-based implementation of genotype imputation. Moreover, we conduct the experiments on supercomputers to evaluate the performance of the proposed method. The evaluation shows that it can significantly shorten the execution time, thus improving the performance for genotype imputation. Conclusion: The proposed multi-level parallelization, when deployed as an imputation as a service, will facilitate bioinformatics researchers in Singapore to conduct genotype imputation and enhance the association study.

Download Full-text

Distributed Singular Value Decomposition Method for Fast Data Processing in Recommendation Systems

Energies ◽

10.3390/en14082284 ◽

2021 ◽

Vol 14 (8) ◽

pp. 2284

Author(s):

Krzysztof Przystupa ◽

Mykola Beshley ◽

Olena Hordiichuk-Bublivska ◽

Marian Kyryk ◽

Halyna Beshley ◽

...

Keyword(s):

Distributed Systems ◽

Singular Value Decomposition ◽

Data Processing ◽

Message Passing ◽

Message Passing Interface ◽

Recommendation Systems ◽

Singular Value ◽

Singular Value Decomposition Method ◽

Value Decomposition ◽

Svd Method

The problem of analyzing a big amount of user data to determine their preferences and, based on these data, to provide recommendations on new products is important. Depending on the correctness and timeliness of the recommendations, significant profits or losses can be obtained. The task of analyzing data on users of services of companies is carried out in special recommendation systems. However, with a large number of users, the data for processing become very big, which causes complexity in the work of recommendation systems. For efficient data analysis in commercial systems, the Singular Value Decomposition (SVD) method can perform intelligent analysis of information. With a large amount of processed information we proposed to use distributed systems. This approach allows reducing time of data processing and recommendations to users. For the experimental study, we implemented the distributed SVD method using Message Passing Interface, Hadoop and Spark technologies and obtained the results of reducing the time of data processing when using distributed systems compared to non-distributed ones.

Download Full-text

A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing ◽

10.1016/0167-8191(96)00024-5 ◽

1996 ◽

Vol 22 (6) ◽

pp. 789-828 ◽

Cited By ~ 1155

Author(s):

William Gropp ◽

Ewing Lusk ◽

Nathan Doss ◽

Anthony Skjellum

Keyword(s):

Message Passing ◽

High Performance ◽

Message Passing Interface

Download Full-text