PERFORMANCE ANALYSIS OF MESSAGE PASSING INTERFACE COLLECTIVE COMMUNICATION ON INTEL XEON QUAD-CORE GIGABIT ETHERNET AND INFINIBAND CLUSTERS

Ismail

doi:10.3844/jcssp.2013.455.462

Optimizing the HOMME dynamical core for multicore platforms

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019849618 ◽

2019 ◽

Vol 33 (5) ◽

pp. 1030-1045 ◽

Cited By ~ 1

Author(s):

John M Dennis ◽

Brian Dobbins ◽

Christopher Kerr ◽

Youngsung Kim

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Programming Model ◽

Scientific Productivity ◽

Incremental Approach ◽

Computational Performance ◽

Code Performance ◽

Computing Platforms ◽

Transformative Impact ◽

Intel Xeon

The approach of the next-generation computing platforms offers a tremendous opportunity to advance the state-of-the-art in global atmospheric dynamical models. We detail our incremental approach to utilize this emerging technology by enhancing concurrency within the High-Order Method Modeling Environment (HOMME) atmospheric dynamical model developed at the National Center for Atmospheric Research (NCAR). The study focused on improvements to the performance of HOMME which is a Fortran 90 code with a hybrid (MPIOpenMP) programming model. The article describes the changes made to the use of message passing interface (MPI) and OpenMP as well as single-core optimizations to achieve significant improvements in concurrency and overall code performance. For our optimization studies, we utilize the “Cori” system with an Intel Xeon Phi Knights Landing processor deployed at the National Energy Research Supercomputing Center and the “`Cheyenne” system with an Intel Xeon Broadwell processor installed at the NCAR. The results from the studies, using “workhorse” configurations performed at NCAR, show that these changes have a transformative impact on the computational performance of HOMME. Our improvements have shown that we can effectively increase potential concurrency by efficiently threading the vertical dimension. Further, we have seen a factor of two overall improvement in the computational performance of the code resulting from the single-core optimizations. Most notably from the work is that our incremental approach allows for high-impact changes without disrupting existing scientific productivity in the HOMME community.

Download Full-text

Porting the AVS/Express scientific visualization software to Cray XT4

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2011.0133 ◽

2011 ◽

Vol 369 (1949) ◽

pp. 3398-3412 ◽

Cited By ~ 2

Author(s):

George W. Leaver ◽

Martin J. Turner ◽

James S. Perrin ◽

Paul M. Mummery ◽

Philip J. Withers

Keyword(s):

Performance Analysis ◽

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Materials Science ◽

Scientific Visualization ◽

Visualization Software ◽

Science Community ◽

Interactive Application ◽

And Performance

Remote scientific visualization, where rendering services are provided by larger scale systems than are available on the desktop, is becoming increasingly important as dataset sizes increase beyond the capabilities of desktop workstations. Uptake of such services relies on access to suitable visualization applications and the ability to view the resulting visualization in a convenient form. We consider five rules from the e-Science community to meet these goals with the porting of a commercial visualization package to a large-scale system. The application uses message-passing interface (MPI) to distribute data among data processing and rendering processes. The use of MPI in such an interactive application is not compatible with restrictions imposed by the Cray system being considered. We present details, and performance analysis, of a new MPI proxy method that allows the application to run within the Cray environment yet still support MPI communication required by the application. Example use cases from materials science are considered.

Download Full-text

Parallel Computing on a Mobile Device

Handbook of Research on Mobile Multimedia, Second Edition ◽

10.4018/978-1-60566-046-2.ch039 ◽

2010 ◽

pp. 566-583

Author(s):

Daniel C. Doolan ◽

Sabin Tabirca ◽

Laurence T. Yang

Keyword(s):

Parallel Computing ◽

Wireless Communication ◽

Mobile Phones ◽

Message Passing ◽

High Speed ◽

Message Passing Interface ◽

Parallel Machines ◽

Gigabit Ethernet ◽

Point To Point ◽

Global Communications

The Message Passing Interface (MPI) was published as a standard in 1992. Since then, many implementations have been developed. The MPICH library is one of the most well-known and freely available implementations. These libraries allow for the simplification of parallel computing on clusters and parallel machines. The system provides the developer with an easy-to-use set of functions for point-to-point and global communications. The details of how the actual communication takes place are hidden from the programmers, allowing them to focus on the domain-specific problem at hand. Communication between nodes on such systems is carried out via high-speed cabled interconnects (Gigabit Ethernet and upwards). The world of mobile computing, especially mobile phones, is now a ubiquitous technology. Mobile devices do not have any facility to allow for connections using traditional high-speed cabling; therefore, it is necessary to make use of wireless communication mechanisms to achieve interdevice communication. The majority of medium- to high-end phones are Bluetooth-enabled as standard, allowing for wireless communication to take place. The Mobile Message Passing Interface (MMPI) provides the developer with an intuitive set of functions to allow for communications between nodes (mobile phones) across a Bluetooth network. This chapter looks at the MMPI library and how it may be used for parallel computing on mobile phones (Smartphones).

Download Full-text

HOMMEXX 1.0: a performance-portable atmospheric dynamical core for the Energy Exascale Earth System Model

Geoscientific Model Development ◽

10.5194/gmd-12-1423-2019 ◽

2019 ◽

Vol 12 (4) ◽

pp. 1423-1441 ◽

Cited By ~ 5

Author(s):

Luca Bertagna ◽

Michael Deakin ◽

Oksana Guba ◽

Daniel Sunderland ◽

Andrew M. Bradley ◽

...

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Earth System Model ◽

Parallel Execution ◽

System Model ◽

Earth System ◽

Dynamical Core ◽

Fortran Implementation ◽

Many Core ◽

Intel Xeon

Abstract. We present an architecture-portable and performant implementation of the atmospheric dynamical core (High-Order Methods Modeling Environment, HOMME) of the Energy Exascale Earth System Model (E3SM). The original Fortran implementation is highly performant and scalable on conventional architectures using the Message Passing Interface (MPI) and Open MultiProcessor (OpenMP) programming models. We rewrite the model in C++ and use the Kokkos library to express on-node parallelism in a largely architecture-independent implementation. Kokkos provides an abstraction of a compute node or device, layout-polymorphic multidimensional arrays, and parallel execution constructs. The new implementation achieves the same or better performance on conventional multicore computers and is portable to GPUs. We present performance data for the original and new implementations on multiple platforms, on up to 5400 compute nodes, and study several aspects of the single- and multi-node performance characteristics of the new implementation on conventional CPU (e.g., Intel Xeon), many core CPU (e.g., Intel Xeon Phi Knights Landing), and Nvidia V100 GPU.

Download Full-text

Multi-level Parallelization of Genotype Imputation on Supercomputers

Current Bioinformatics ◽

10.2174/1574893615999200420071307 ◽

2020 ◽

Vol 15 ◽

Author(s):

Weiwen Zhang ◽

Long Wang ◽

Theint Theint Aye ◽

Juniarto Samsudin ◽

Yongqing Zhu

Keyword(s):

Association Study ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Genome Wide Association Study ◽

Job Scheduling ◽

Genotype Imputation ◽

Job Level ◽

Multi Level ◽

High Performance Requirement

Background: Genotype imputation as a service is developed to enable researchers to estimate genotypes on haplotyped data without performing whole genome sequencing. However, genotype imputation is computation intensive and thus it remains a challenge to satisfy the high performance requirement of genome wide association study (GWAS). Objective: In this paper, we propose a high performance computing solution for genotype imputation on supercomputers to enhance its execution performance. Method: We design and implement a multi-level parallelization that includes job level, process level and thread level parallelization, enabled by job scheduling management, message passing interface (MPI) and OpenMP, respectively. It involves job distribution, chunk partition and execution, parallelized iteration for imputation and data concatenation. Due to the design of multi-level parallelization, we can exploit the multi-machine/multi-core architecture to improve the performance of genotype imputation. Results: Experiment results show that our proposed method can outperform the Hadoop-based implementation of genotype imputation. Moreover, we conduct the experiments on supercomputers to evaluate the performance of the proposed method. The evaluation shows that it can significantly shorten the execution time, thus improving the performance for genotype imputation. Conclusion: The proposed multi-level parallelization, when deployed as an imputation as a service, will facilitate bioinformatics researchers in Singapore to conduct genotype imputation and enhance the association study.

Download Full-text

Distributed Singular Value Decomposition Method for Fast Data Processing in Recommendation Systems

Energies ◽

10.3390/en14082284 ◽

2021 ◽

Vol 14 (8) ◽

pp. 2284

Author(s):

Krzysztof Przystupa ◽

Mykola Beshley ◽

Olena Hordiichuk-Bublivska ◽

Marian Kyryk ◽

Halyna Beshley ◽

...

Keyword(s):

Distributed Systems ◽

Singular Value Decomposition ◽

Data Processing ◽

Message Passing ◽

Message Passing Interface ◽

Recommendation Systems ◽

Singular Value ◽

Singular Value Decomposition Method ◽

Value Decomposition ◽

Svd Method

The problem of analyzing a big amount of user data to determine their preferences and, based on these data, to provide recommendations on new products is important. Depending on the correctness and timeliness of the recommendations, significant profits or losses can be obtained. The task of analyzing data on users of services of companies is carried out in special recommendation systems. However, with a large number of users, the data for processing become very big, which causes complexity in the work of recommendation systems. For efficient data analysis in commercial systems, the Singular Value Decomposition (SVD) method can perform intelligent analysis of information. With a large amount of processed information we proposed to use distributed systems. This approach allows reducing time of data processing and recommendations to users. For the experimental study, we implemented the distributed SVD method using Message Passing Interface, Hadoop and Spark technologies and obtained the results of reducing the time of data processing when using distributed systems compared to non-distributed ones.

Download Full-text

A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing ◽

10.1016/0167-8191(96)00024-5 ◽

1996 ◽

Vol 22 (6) ◽

pp. 789-828 ◽

Cited By ~ 1155

Author(s):

William Gropp ◽

Ewing Lusk ◽

Nathan Doss ◽

Anthony Skjellum

Keyword(s):

Message Passing ◽

High Performance ◽

Message Passing Interface

Download Full-text

Parallel implementation for HSLO(3)-FDTD with message passing interface on Distributed Memory Architecture

2006 International Conference on Computing & Informatics ◽

10.1109/icoci.2006.5276531 ◽

2006 ◽

Author(s):

Mohammad Khatim Hasan ◽

Mohamed Othman ◽

Jalil Md Desa ◽

Zulkifly Abbas ◽

Jumat Sulaiman

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Distributed Memory ◽

Parallel Implementation ◽

Memory Architecture ◽

Distributed Memory Architecture

Download Full-text

CCJ: object-based message passing and collective communication in Java

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.664 ◽

2003 ◽

Vol 15 (35) ◽

pp. 341-369 ◽

Cited By ~ 8

Author(s):

Arnold Nelisse ◽

Jason Maassen ◽

Thilo Kielmann ◽

Henri E. Bal

Keyword(s):

Message Passing ◽

Collective Communication ◽

Object Based

Download Full-text

Based on Numerical Simulation of High-Performance Parallel Machine Muffler Experimental Calibration

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.718-720.1645 ◽

2013 ◽

Vol 718-720 ◽

pp. 1645-1650

Author(s):

Gen Yin Cheng ◽

Sheng Chen Yu ◽

Zhi Yong Wei ◽

Shao Jie Chen ◽

You Cheng

Keyword(s):

Numerical Simulation ◽

Finite Element ◽

Boundary Element ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Parallel Machine ◽

Simulation Software ◽

Experimental Calibration ◽

The Cost

Commonly used commercial simulation software SYSNOISE and ANSYS is run on a single machine (can not directly run on parallel machine) when use the finite element and boundary element to simulate muffler effect, and it will take more than ten days, sometimes even twenty days to work out an exact solution as the large amount of numerical simulation. Use a high performance parallel machine which was built by 32 commercial computers and transform the finite element and boundary element simulation software into a program that can running under the MPI (message passing interface) parallel environment in order to reduce the cost of numerical simulation. The relevant data worked out from the simulation experiment demonstrate that the result effect of the numerical simulation is well. And the computing speed of the high performance parallel machine is 25 ~ 30 times a microcomputer.

Download Full-text