scholarly journals PERFORMANCE ANALYSIS OF MESSAGE PASSING INTERFACE COLLECTIVE COMMUNICATION ON INTEL XEON QUAD-CORE GIGABIT ETHERNET AND INFINIBAND CLUSTERS

2013 ◽  
Vol 9 (4) ◽  
pp. 455-462
Author(s):  
Ismail
Author(s):  
John M Dennis ◽  
Brian Dobbins ◽  
Christopher Kerr ◽  
Youngsung Kim

The approach of the next-generation computing platforms offers a tremendous opportunity to advance the state-of-the-art in global atmospheric dynamical models. We detail our incremental approach to utilize this emerging technology by enhancing concurrency within the High-Order Method Modeling Environment (HOMME) atmospheric dynamical model developed at the National Center for Atmospheric Research (NCAR). The study focused on improvements to the performance of HOMME which is a Fortran 90 code with a hybrid (MPIOpenMP) programming model. The article describes the changes made to the use of message passing interface (MPI) and OpenMP as well as single-core optimizations to achieve significant improvements in concurrency and overall code performance. For our optimization studies, we utilize the “Cori” system with an Intel Xeon Phi Knights Landing processor deployed at the National Energy Research Supercomputing Center and the “`Cheyenne” system with an Intel Xeon Broadwell processor installed at the NCAR. The results from the studies, using “workhorse” configurations performed at NCAR, show that these changes have a transformative impact on the computational performance of HOMME. Our improvements have shown that we can effectively increase potential concurrency by efficiently threading the vertical dimension. Further, we have seen a factor of two overall improvement in the computational performance of the code resulting from the single-core optimizations. Most notably from the work is that our incremental approach allows for high-impact changes without disrupting existing scientific productivity in the HOMME community.


Author(s):  
George W. Leaver ◽  
Martin J. Turner ◽  
James S. Perrin ◽  
Paul M. Mummery ◽  
Philip J. Withers

Remote scientific visualization, where rendering services are provided by larger scale systems than are available on the desktop, is becoming increasingly important as dataset sizes increase beyond the capabilities of desktop workstations. Uptake of such services relies on access to suitable visualization applications and the ability to view the resulting visualization in a convenient form. We consider five rules from the e-Science community to meet these goals with the porting of a commercial visualization package to a large-scale system. The application uses message-passing interface (MPI) to distribute data among data processing and rendering processes. The use of MPI in such an interactive application is not compatible with restrictions imposed by the Cray system being considered. We present details, and performance analysis, of a new MPI proxy method that allows the application to run within the Cray environment yet still support MPI communication required by the application. Example use cases from materials science are considered.


Author(s):  
Daniel C. Doolan ◽  
Sabin Tabirca ◽  
Laurence T. Yang

The Message Passing Interface (MPI) was published as a standard in 1992. Since then, many implementations have been developed. The MPICH library is one of the most well-known and freely available implementations. These libraries allow for the simplification of parallel computing on clusters and parallel machines. The system provides the developer with an easy-to-use set of functions for point-to-point and global communications. The details of how the actual communication takes place are hidden from the programmers, allowing them to focus on the domain-specific problem at hand. Communication between nodes on such systems is carried out via high-speed cabled interconnects (Gigabit Ethernet and upwards). The world of mobile computing, especially mobile phones, is now a ubiquitous technology. Mobile devices do not have any facility to allow for connections using traditional high-speed cabling; therefore, it is necessary to make use of wireless communication mechanisms to achieve interdevice communication. The majority of medium- to high-end phones are Bluetooth-enabled as standard, allowing for wireless communication to take place. The Mobile Message Passing Interface (MMPI) provides the developer with an intuitive set of functions to allow for communications between nodes (mobile phones) across a Bluetooth network. This chapter looks at the MMPI library and how it may be used for parallel computing on mobile phones (Smartphones).


2019 ◽  
Vol 12 (4) ◽  
pp. 1423-1441 ◽  
Author(s):  
Luca Bertagna ◽  
Michael Deakin ◽  
Oksana Guba ◽  
Daniel Sunderland ◽  
Andrew M. Bradley ◽  
...  

Abstract. We present an architecture-portable and performant implementation of the atmospheric dynamical core (High-Order Methods Modeling Environment, HOMME) of the Energy Exascale Earth System Model (E3SM). The original Fortran implementation is highly performant and scalable on conventional architectures using the Message Passing Interface (MPI) and Open MultiProcessor (OpenMP) programming models. We rewrite the model in C++ and use the Kokkos library to express on-node parallelism in a largely architecture-independent implementation. Kokkos provides an abstraction of a compute node or device, layout-polymorphic multidimensional arrays, and parallel execution constructs. The new implementation achieves the same or better performance on conventional multicore computers and is portable to GPUs. We present performance data for the original and new implementations on multiple platforms, on up to 5400 compute nodes, and study several aspects of the single- and multi-node performance characteristics of the new implementation on conventional CPU (e.g., Intel Xeon), many core CPU (e.g., Intel Xeon Phi Knights Landing), and Nvidia V100 GPU.


2020 ◽  
Vol 15 ◽  
Author(s):  
Weiwen Zhang ◽  
Long Wang ◽  
Theint Theint Aye ◽  
Juniarto Samsudin ◽  
Yongqing Zhu

Background: Genotype imputation as a service is developed to enable researchers to estimate genotypes on haplotyped data without performing whole genome sequencing. However, genotype imputation is computation intensive and thus it remains a challenge to satisfy the high performance requirement of genome wide association study (GWAS). Objective: In this paper, we propose a high performance computing solution for genotype imputation on supercomputers to enhance its execution performance. Method: We design and implement a multi-level parallelization that includes job level, process level and thread level parallelization, enabled by job scheduling management, message passing interface (MPI) and OpenMP, respectively. It involves job distribution, chunk partition and execution, parallelized iteration for imputation and data concatenation. Due to the design of multi-level parallelization, we can exploit the multi-machine/multi-core architecture to improve the performance of genotype imputation. Results: Experiment results show that our proposed method can outperform the Hadoop-based implementation of genotype imputation. Moreover, we conduct the experiments on supercomputers to evaluate the performance of the proposed method. The evaluation shows that it can significantly shorten the execution time, thus improving the performance for genotype imputation. Conclusion: The proposed multi-level parallelization, when deployed as an imputation as a service, will facilitate bioinformatics researchers in Singapore to conduct genotype imputation and enhance the association study.


Energies ◽  
2021 ◽  
Vol 14 (8) ◽  
pp. 2284
Author(s):  
Krzysztof Przystupa ◽  
Mykola Beshley ◽  
Olena Hordiichuk-Bublivska ◽  
Marian Kyryk ◽  
Halyna Beshley ◽  
...  

The problem of analyzing a big amount of user data to determine their preferences and, based on these data, to provide recommendations on new products is important. Depending on the correctness and timeliness of the recommendations, significant profits or losses can be obtained. The task of analyzing data on users of services of companies is carried out in special recommendation systems. However, with a large number of users, the data for processing become very big, which causes complexity in the work of recommendation systems. For efficient data analysis in commercial systems, the Singular Value Decomposition (SVD) method can perform intelligent analysis of information. With a large amount of processed information we proposed to use distributed systems. This approach allows reducing time of data processing and recommendations to users. For the experimental study, we implemented the distributed SVD method using Message Passing Interface, Hadoop and Spark technologies and obtained the results of reducing the time of data processing when using distributed systems compared to non-distributed ones.


1996 ◽  
Vol 22 (6) ◽  
pp. 789-828 ◽  
Author(s):  
William Gropp ◽  
Ewing Lusk ◽  
Nathan Doss ◽  
Anthony Skjellum

2003 ◽  
Vol 15 (35) ◽  
pp. 341-369 ◽  
Author(s):  
Arnold Nelisse ◽  
Jason Maassen ◽  
Thilo Kielmann ◽  
Henri E. Bal

2013 ◽  
Vol 718-720 ◽  
pp. 1645-1650
Author(s):  
Gen Yin Cheng ◽  
Sheng Chen Yu ◽  
Zhi Yong Wei ◽  
Shao Jie Chen ◽  
You Cheng

Commonly used commercial simulation software SYSNOISE and ANSYS is run on a single machine (can not directly run on parallel machine) when use the finite element and boundary element to simulate muffler effect, and it will take more than ten days, sometimes even twenty days to work out an exact solution as the large amount of numerical simulation. Use a high performance parallel machine which was built by 32 commercial computers and transform the finite element and boundary element simulation software into a program that can running under the MPI (message passing interface) parallel environment in order to reduce the cost of numerical simulation. The relevant data worked out from the simulation experiment demonstrate that the result effect of the numerical simulation is well. And the computing speed of the high performance parallel machine is 25 ~ 30 times a microcomputer.


Sign in / Sign up

Export Citation Format

Share Document