scholarly journals Open | SpeedShop: An Open Source Infrastructure for Parallel Performance Analysis

2008 ◽  
Vol 16 (2-3) ◽  
pp. 105-121 ◽  
Author(s):  
Martin Schulz ◽  
Jim Galarowicz ◽  
Don Maghrak ◽  
William Hachfeld ◽  
David Montoya ◽  
...  

Over the last decades a large number of performance tools has been developed to analyze and optimize high performance applications. Their acceptance by end users, however, has been slow: each tool alone is often limited in scope and comes with widely varying interfaces and workflow constraints, requiring different changes in the often complex build and execution infrastructure of the target application. We started the Open | SpeedShop project about 3 years ago to overcome these limitations and provide efficient, easy to apply, and integrated performance analysis for parallel systems. Open | SpeedShop has two different faces: it provides an interoperable tool set covering the most common analysis steps as well as a comprehensive plugin infrastructure for building new tools. In both cases, the tools can be deployed to large scale parallel applications using DPCL/Dyninst for distributed binary instrumentation. Further, all tools developed within or on top of Open | SpeedShop are accessible through multiple fully equivalent interfaces including an easy-to-use GUI as well as an interactive command line interface reducing the usage threshold for those tools.

Author(s):  
Mark Endrei ◽  
Chao Jin ◽  
Minh Ngoc Dinh ◽  
David Abramson ◽  
Heidi Poxon ◽  
...  

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.


Author(s):  
Felix Wolf ◽  
Brian J. N. Wylie ◽  
Erika Ábrahám ◽  
Daniel Becker ◽  
Wolfgang Frings ◽  
...  

1999 ◽  
Vol 09 (02) ◽  
pp. 243-252 ◽  
Author(s):  
O. LARSSON ◽  
M. FEIG ◽  
L. JOHNSSON

We demonstrate good metacomputing efficiency and portability for three typical large-scale parallel applications; one molecular dynamics code and two electromagnetics codes. The codes were developed for distributed memory parallel platforms using Fortran77 or Fortran90 with MPI. The performance measurements were made for a testbed of two IBM SPs connected through the vBNS. No change of the application codes were required for correct execution of the codes on the testbed using the Globus Toolkit for the required metacomputing services. However, we observe that for good performance, it may be necessary for MPI codes to make use of overlapped computation and communication. For such MPI codes, a communications library designed for hierarchical or clustered communication can yield very good metacomputing efficiencies when high-performance networks, such as the vBNS or the Abilene networks, such as the vBNS or the Abilene networks, are used for platform connectivity. We demonstrate this by inserting a thin layer between the MPI application and the MPI libraries, providing some clustering of communications between platforms.


2006 ◽  
Vol 16 (03) ◽  
pp. 323-334
Author(s):  
IGOR ROZMAN ◽  
MARJAN ŠTERK ◽  
ROMAN TROBEC

High performance parallel computers provide computational rates necessary for computer simulations and intensive computing applications. An important part of a parallel computer program is an MPI software library, which implements communication within parallel applications. Several MPI implementations exist, most widely used among them are LAM/MPI and MPICH. This paper presents results of four basic synthetic tests and two real simulations in LAM/MPI and MPICH environments. Tests were made on a computer cluster composed of 17 dual-processor nodes connected by a toroidal mesh. Results show that on the investigated cluster, LAM outperformed MPICH especially by bidirectional ring communication, and that appropriate trimming of communication parameters significantly contributes to the final parallel performance.


Gigabyte ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Ben Duggan ◽  
John Metzcar ◽  
Paul Macklin

Modern agent-based models (ABM) and other simulation models require evaluation and testing of many different parameters. Managing that testing for large scale parameter sweeps (grid searches), as well as storing simulation data, requires multiple, potentially customizable steps that may vary across simulations. Furthermore, parameter testing, processing, and analysis are slowed if simulation and processing jobs cannot be shared across teammates or computational resources. While high-performance computing (HPC) has become increasingly available, models can often be tested faster with the use of multiple computers and HPC resources. To address these issues, we created the Distributed Automated Parameter Testing (DAPT) Python package. By hosting parameters in an online (and often free) “database”, multiple individuals can run parameter sets simultaneously in a distributed fashion, enabling ad hoc crowdsourcing of computational power. Combining this with a flexible, scriptable tool set, teams can evaluate models and assess their underlying hypotheses quickly. Here, we describe DAPT and provide an example demonstrating its use.


TEKNOKOM ◽  
2021 ◽  
Vol 4 (1) ◽  
pp. 30-36
Author(s):  
Budi Santoso ◽  
Asrul Sani ◽  
T. Husain ◽  
Nedi Hendri

Data exchange communication has developed, which leads to centralized communication, and to achieve this communication requires a type of data communication whose data is accommodated on the server and can be accessed by clients, such as at organization. As a company engaged in education, the development of centralized data communication by utilizing the intranet network has been formed. The use of an intranet network allows data communication that is vulnerable to wiretapping. To fix this using a VPN network. L2TP and IPsec VPNs have different performances, especially in the level of security provided. In this study, an analysis of the L2TP and IPsec VPN network performance was carried out on the SMB Server on the Ubuntu server and the Mikrotik router for its VPN configuration. In this study, the L2TP and IPsec VPN was designed by configuring the Mikrotik RB 450G router and the SMB Server configuration using Command Line Interface on Ubuntu 18.04 server. For security analysis, use hacking methods to get VPN Server login data and sniffing methods to get SMB Server login data and SMB data. For performance analysis using parameters of delay, throughput, and packet loss. Wireshark is software for checking by capturing each packet of data from an interface. The research objective to be achieved is to design a VPN technology based on L2TP & IPSec, to be able to determine the resulting performance after implementing a VPN based on L2TP & Ip Sec. The result is that VPN can connect from HO to branch one and branch two or connect from public connection to local connection. The Ubuntu server used is also running well, so it helps the VPN process properly.


2012 ◽  
Vol 2012 ◽  
pp. 1-18 ◽  
Author(s):  
Xiaocheng Liu ◽  
Bin Chen ◽  
Xiaogang Qiu ◽  
Ying Cai ◽  
Kedi Huang

An increasing number of high performance computing parallel applications leverages the power of the cloud for parallel processing. How to schedule the parallel applications to improve the quality of service is the key to the successful host of parallel applications in the cloud. The large scale of the cloud makes the parallel job scheduling more complicated as even simple parallel job scheduling problem is NP-complete. In this paper, we propose a parallel job scheduling algorithm named MEASY. MEASY adopts migration and consolidation to enhance the most popular EASY scheduling algorithm. Our extensive experiments on well-known workloads show that our algorithm takes very good care of the quality of service. For two common parallel job scheduling objectives, our algorithm produces an up to 41.1% and an average of 23.1% improvement on the average response time; an up to 82.9% and an average of 69.3% improvement on the average slowdown. Our algorithm is robust even in terms that it allows inaccurate CPU usage estimation and high migration cost. Our approach involves trivial modification on EASY and requires no additional technique; it is practical and effective in the cloud environment.


Author(s):  
Ahmad Awwad ◽  
Jehad Al-Sadi ◽  
Bassam Haddad ◽  
Ahmad Kayed

Recent studies have revealed that the Optical Transpose Interconnection Systems (OTIS) are promising candidates for future high-performance parallel computers. This paper presents and evaluates a general method for algorithm development on the OTIS-Arrangement network (OTIS-AN) as an example of OTIS network. The proposed method can be used and customized for any other OTIS network. Furthermore, it allows efficient mapping of a wide class of algorithms into the OTIS-AN. This method is based on grids and pipelines as popular structures that support a vast body of parallel applications including linear algebra, divide-and-conquer types of algorithms, sorting, and FFT computation. This study confirms the viability of the OTIS-AN as an attractive alternative for large-scale parallel architectures.


Author(s):  
Ben Duggan ◽  
John Metzcar ◽  
Paul Macklin

Modern agent-based models (ABM) and other simulation models require evaluation and testing of many different parameters. Managing that testing for large scale parameter sweeps (grid searches) as well as storing simulation data requires multiple, potentially customizable steps that may vary across simulations. Furthermore, parameter testing, processing, and analysis are slowed if simulation and processing jobs cannot be shared across teammates or computational resources. While high-performance computing (HPC) has become increasingly available, models can often be tested faster through the use of multiple computers and HPC resources. To address these issues, we created the Distributed Automated Parameter Testing (DAPT) Python package. By hosting parameters in an online (and often free) "database", multiple individuals can run tests simultaneously in a distributed fashion, enabling ad hoc crowdsourcing of computational power. Combining this with a flexible, scriptable tool set, teams can evaluate models and assess their underlying hypotheses quickly. Here we describe DAPT and provide an example demonstrating its use.


Sign in / Sign up

Export Citation Format

Share Document