Open | SpeedShop: An Open Source Infrastructure for Parallel Performance Analysis

Martin Schulz; Jim Galarowicz; Don Maghrak; William Hachfeld; David Montoya; Scott Cranford

doi:10.1155/2008/713705

Open | SpeedShop: An Open Source Infrastructure for Parallel Performance Analysis

Scientific Programming ◽

10.1155/2008/713705 ◽

2008 ◽

Vol 16 (2-3) ◽

pp. 105-121 ◽

Cited By ~ 36

Author(s):

Martin Schulz ◽

Jim Galarowicz ◽

Don Maghrak ◽

William Hachfeld ◽

David Montoya ◽

...

Keyword(s):

Performance Analysis ◽

High Performance ◽

Large Scale ◽

Parallel Applications ◽

Set Covering ◽

Command Line ◽

Command Line Interface ◽

Parallel Performance ◽

Tool Set ◽

Integrated Performance

Over the last decades a large number of performance tools has been developed to analyze and optimize high performance applications. Their acceptance by end users, however, has been slow: each tool alone is often limited in scope and comes with widely varying interfaces and workflow constraints, requiring different changes in the often complex build and execution infrastructure of the target application. We started the Open | SpeedShop project about 3 years ago to overcome these limitations and provide efficient, easy to apply, and integrated performance analysis for parallel systems. Open | SpeedShop has two different faces: it provides an interoperable tool set covering the most common analysis steps as well as a comprehensive plugin infrastructure for building new tools. In both cases, the tools can be deployed to large scale parallel applications using DPCL/Dyninst for distributed binary instrumentation. Further, all tools developed within or on top of Open | SpeedShop are accessible through multiple fully equivalent interfaces including an easy-to-use GUI as well as an interactive command line interface reducing the usage threshold for those tools.

Download Full-text

Statistical and machine learning models for optimizing energy in parallel applications

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019842915 ◽

2019 ◽

Vol 33 (6) ◽

pp. 1079-1097 ◽

Cited By ~ 2

Author(s):

Mark Endrei ◽

Chao Jin ◽

Minh Ngoc Dinh ◽

David Abramson ◽

Heidi Poxon ◽

...

Keyword(s):

Machine Learning ◽

Energy Efficiency ◽

High Performance ◽

Large Scale ◽

Energy Use ◽

Parallel Applications ◽

Learning Models ◽

Trade Off ◽

Time Required ◽

Machine Learning Models

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.

Download Full-text

Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications

Tools for High Performance Computing ◽

10.1007/978-3-540-68564-7_10 ◽

2008 ◽

pp. 157-167 ◽

Cited By ~ 32

Author(s):

Felix Wolf ◽

Brian J. N. Wylie ◽

Erika Ábrahám ◽

Daniel Becker ◽

Wolfgang Frings ◽

...

Keyword(s):

Performance Analysis ◽

Large Scale ◽

Parallel Applications

Download Full-text

SOME METACOMPUTING EXPERIENCES FOR SCIENTIFIC APPLICATIONS

Parallel Processing Letters ◽

10.1142/s0129626499000232 ◽

1999 ◽

Vol 09 (02) ◽

pp. 243-252 ◽

Cited By ~ 2

Author(s):

O. LARSSON ◽

M. FEIG ◽

L. JOHNSSON

Keyword(s):

Molecular Dynamics ◽

Thin Layer ◽

High Performance ◽

Large Scale ◽

Distributed Memory ◽

Parallel Applications ◽

Globus Toolkit ◽

Performance Measurements ◽

Correct Execution ◽

Application Codes

We demonstrate good metacomputing efficiency and portability for three typical large-scale parallel applications; one molecular dynamics code and two electromagnetics codes. The codes were developed for distributed memory parallel platforms using Fortran77 or Fortran90 with MPI. The performance measurements were made for a testbed of two IBM SPs connected through the vBNS. No change of the application codes were required for correct execution of the codes on the testbed using the Globus Toolkit for the required metacomputing services. However, we observe that for good performance, it may be necessary for MPI codes to make use of overlapped computation and communication. For such MPI codes, a communications library designed for hierarchical or clustered communication can yield very good metacomputing efficiencies when high-performance networks, such as the vBNS or the Abilene networks, such as the vBNS or the Abilene networks, are used for platform connectivity. We demonstrate this by inserting a thin layer between the MPI application and the MPI libraries, providing some clustering of communications between platforms.

Download Full-text

COMMUNICATION PERFORMANCE OF LAM/MPI AND MPICH ON A LINUX CLUSTER

Parallel Processing Letters ◽

10.1142/s0129626406002678 ◽

2006 ◽

Vol 16 (03) ◽

pp. 323-334

Author(s):

IGOR ROZMAN ◽

MARJAN ŠTERK ◽

ROMAN TROBEC

Keyword(s):

Computer Simulations ◽

High Performance ◽

Parallel Computers ◽

Parallel Computer ◽

Parallel Applications ◽

Software Library ◽

Communication Performance ◽

Computer Cluster ◽

Parallel Performance ◽

Linux Cluster

High performance parallel computers provide computational rates necessary for computer simulations and intensive computing applications. An important part of a parallel computer program is an MPI software library, which implements communication within parallel applications. Several MPI implementations exist, most widely used among them are LAM/MPI and MPICH. This paper presents results of four basic synthetic tests and two real simulations in LAM/MPI and MPICH environments. Tests were made on a computer cluster composed of 17 dual-processor nodes connected by a toroidal mesh. Results show that on the investigated cluster, LAM outperformed MPICH especially by bidirectional ring communication, and that appropriate trimming of communication parameters significantly contributes to the final parallel performance.

Download Full-text

DAPT: A package enabling distributed automated parameter testing

Gigabyte ◽

10.46471/gigabyte.22 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Ben Duggan ◽

John Metzcar ◽

Paul Macklin

Keyword(s):

High Performance ◽

Large Scale ◽

Ad Hoc ◽

Simulation Models ◽

Power Combining ◽

Agent Based ◽

Tool Set ◽

Computational Resources ◽

Performance Computing ◽

Python Package

Modern agent-based models (ABM) and other simulation models require evaluation and testing of many different parameters. Managing that testing for large scale parameter sweeps (grid searches), as well as storing simulation data, requires multiple, potentially customizable steps that may vary across simulations. Furthermore, parameter testing, processing, and analysis are slowed if simulation and processing jobs cannot be shared across teammates or computational resources. While high-performance computing (HPC) has become increasingly available, models can often be tested faster with the use of multiple computers and HPC resources. To address these issues, we created the Distributed Automated Parameter Testing (DAPT) Python package. By hosting parameters in an online (and often free) “database”, multiple individuals can run parameter sets simultaneously in a distributed fashion, enabling ad hoc crowdsourcing of computational power. Combining this with a flexible, scriptable tool set, teams can evaluate models and assess their underlying hypotheses quickly. Here, we describe DAPT and provide an example demonstrating its use.

Download Full-text

Performance Analysis of Large Scale Parallel Applications

Tools and Environments for Parallel and Distributed Systems ◽

10.1007/978-1-4615-4123-3_7 ◽

1996 ◽

pp. 129-148

Author(s):

Olav Hansen

Keyword(s):

Performance Analysis ◽

Large Scale ◽

Parallel Applications

Download Full-text

VPN SITE TO SITE IMPLEMENTATION USING PROTOCOL L2TP AND IPSEC

TEKNOKOM ◽

10.31943/teknokom.v4i1.59 ◽

2021 ◽

Vol 4 (1) ◽

pp. 30-36

Author(s):

Budi Santoso ◽

Asrul Sani ◽

T. Husain ◽

Nedi Hendri

Keyword(s):

Performance Analysis ◽

Packet Loss ◽

Data Exchange ◽

Network Performance ◽

Security Analysis ◽

Data Communication ◽

Command Line ◽

Command Line Interface ◽

Local Connection ◽

A Company

Data exchange communication has developed, which leads to centralized communication, and to achieve this communication requires a type of data communication whose data is accommodated on the server and can be accessed by clients, such as at organization. As a company engaged in education, the development of centralized data communication by utilizing the intranet network has been formed. The use of an intranet network allows data communication that is vulnerable to wiretapping. To fix this using a VPN network. L2TP and IPsec VPNs have different performances, especially in the level of security provided. In this study, an analysis of the L2TP and IPsec VPN network performance was carried out on the SMB Server on the Ubuntu server and the Mikrotik router for its VPN configuration. In this study, the L2TP and IPsec VPN was designed by configuring the Mikrotik RB 450G router and the SMB Server configuration using Command Line Interface on Ubuntu 18.04 server. For security analysis, use hacking methods to get VPN Server login data and sniffing methods to get SMB Server login data and SMB data. For performance analysis using parameters of delay, throughput, and packet loss. Wireshark is software for checking by capturing each packet of data from an interface. The research objective to be achieved is to design a VPN technology based on L2TP & IPSec, to be able to determine the resulting performance after implementing a VPN based on L2TP & Ip Sec. The result is that VPN can connect from HO to branch one and branch two or connect from public connection to local connection. The Ubuntu server used is also running well, so it helps the VPN process properly.

Download Full-text

Scheduling Parallel Jobs Using Migration and Consolidation in the Cloud

Mathematical Problems in Engineering ◽

10.1155/2012/695757 ◽

2012 ◽

Vol 2012 ◽

pp. 1-18 ◽

Cited By ~ 4

Author(s):

Xiaocheng Liu ◽

Bin Chen ◽

Xiaogang Qiu ◽

Ying Cai ◽

Kedi Huang

Keyword(s):

Quality Of Service ◽

High Performance ◽

Large Scale ◽

Job Scheduling ◽

Scheduling Algorithm ◽

Parallel Applications ◽

Parallel Job Scheduling ◽

Parallel Job ◽

Job Scheduling Problem

An increasing number of high performance computing parallel applications leverages the power of the cloud for parallel processing. How to schedule the parallel applications to improve the quality of service is the key to the successful host of parallel applications in the cloud. The large scale of the cloud makes the parallel job scheduling more complicated as even simple parallel job scheduling problem is NP-complete. In this paper, we propose a parallel job scheduling algorithm named MEASY. MEASY adopts migration and consolidation to enhance the most popular EASY scheduling algorithm. Our extensive experiments on well-known workloads show that our algorithm takes very good care of the quality of service. For two common parallel job scheduling objectives, our algorithm produces an up to 41.1% and an average of 23.1% improvement on the average response time; an up to 82.9% and an average of 69.3% improvement on the average slowdown. Our algorithm is robust even in terms that it allows inaccurate CPU usage estimation and high migration cost. Our approach involves trivial modification on EASY and requires no additional technique; it is practical and effective in the cloud environment.

Download Full-text

Structural Outlooks for the OTIS-Arrangement Network

Applications and Developments in Grid, Cloud, and High Performance Computing ◽

10.4018/978-1-4666-2065-0.ch014 ◽

2013 ◽

pp. 221-231

Author(s):

Ahmad Awwad ◽

Jehad Al-Sadi ◽

Bassam Haddad ◽

Ahmad Kayed

Keyword(s):

Linear Algebra ◽

Wide Class ◽

High Performance ◽

Large Scale ◽

Parallel Computers ◽

Parallel Architectures ◽

Divide And Conquer ◽

Parallel Applications ◽

Attractive Alternative ◽

General Method

Recent studies have revealed that the Optical Transpose Interconnection Systems (OTIS) are promising candidates for future high-performance parallel computers. This paper presents and evaluates a general method for algorithm development on the OTIS-Arrangement network (OTIS-AN) as an example of OTIS network. The proposed method can be used and customized for any other OTIS network. Furthermore, it allows efficient mapping of a wide class of algorithms into the OTIS-AN. This method is based on grids and pipelines as popular structures that support a vast body of parallel applications including linear algebra, divide-and-conquer types of algorithms, sorting, and FFT computation. This study confirms the viability of the OTIS-AN as an attractive alternative for large-scale parallel architectures.

Download Full-text

DAPT: A Package Enabling Distributed Automated Parameter Testing

10.20944/preprints202103.0116.v1 ◽

2021 ◽

Author(s):

Ben Duggan ◽

John Metzcar ◽

Paul Macklin

Keyword(s):

High Performance ◽

Large Scale ◽

Ad Hoc ◽

Simulation Models ◽

Power Combining ◽

Agent Based ◽

Tool Set ◽

Computational Resources ◽

Performance Computing ◽

Python Package

Modern agent-based models (ABM) and other simulation models require evaluation and testing of many different parameters. Managing that testing for large scale parameter sweeps (grid searches) as well as storing simulation data requires multiple, potentially customizable steps that may vary across simulations. Furthermore, parameter testing, processing, and analysis are slowed if simulation and processing jobs cannot be shared across teammates or computational resources. While high-performance computing (HPC) has become increasingly available, models can often be tested faster through the use of multiple computers and HPC resources. To address these issues, we created the Distributed Automated Parameter Testing (DAPT) Python package. By hosting parameters in an online (and often free) "database", multiple individuals can run tests simultaneously in a distributed fashion, enabling ad hoc crowdsourcing of computational power. Combining this with a flexible, scriptable tool set, teams can evaluate models and assess their underlying hypotheses quickly. Here we describe DAPT and provide an example demonstrating its use.

Download Full-text