MPVisualizer: A general tool to debug message passing parallel applications

Demonstration of cluster computing for three-dimensional CFD simulations

The Aeronautical Journal ◽

10.1017/s0001924000028037 ◽

1999 ◽

Vol 103 (1027) ◽

pp. 443-447 ◽

Cited By ~ 5

Author(s):

W. McMillan ◽

M. Woodgate ◽

B. E. Richards ◽

B. J. Gribben ◽

K. J. Badcock ◽

...

Keyword(s):

Message Passing ◽

Large Scale ◽

Cluster Computing ◽

Low Cost ◽

Three Dimensional ◽

Cost Effective ◽

Parallel Applications ◽

Cfd Simulations ◽

Single Node ◽

Computing Unit

Abstract Motivated by a lack of sufficient local and national computing facilities for computational fluid dynamics simulations, the Affordable Systems Computing Unit (ASCU) was established to investigate low cost alternatives. The options considered have all involved cluster computing, a term which refers to the grouping of a number of components into a managed system capable of running both serial and parallel applications. The present work aims to demonstrate the utility of commodity processors for dedicated batch processing. The performance of the cluster has proved to be extremely cost effective, enabling large three dimensional flow simulations on a computer costing less than £25k sterling at current market prices. The experience gained on this system in terms of single node performance, message passing and parallel performance will be discussed. In particular, comparisons with the performance of other systems will be made. Several medium-large scale CFD simulations performed using the new cluster will be presented to demonstrate the potential of commodity processor based parallel computers for aerodynamic simulation.

Download Full-text

Providing Non-stop Service for Message-Passing Based Parallel Applications with RADIC

Lecture Notes in Computer Science - Euro-Par 2008 – Parallel Processing ◽

10.1007/978-3-540-85451-7_7 ◽

2008 ◽

pp. 58-67 ◽

Cited By ~ 5

Author(s):

Guna Santos ◽

Angelo Duarte ◽

Dolores Rexachs ◽

Emilio Luque

Keyword(s):

Message Passing ◽

Parallel Applications

Download Full-text

GENERATING MESSAGE-PASSING PROGRAMS FROM ABSTRACT SPECIFICATIONS BY PARTIAL EVALUATION

Parallel Processing Letters ◽

10.1142/s0129626405002234 ◽

2005 ◽

Vol 15 (03) ◽

pp. 305-320 ◽

Cited By ~ 5

Author(s):

CHRISTOPH A. HERRMANN

Keyword(s):

Message Passing ◽

Cost Model ◽

Partial Evaluation ◽

Evaluation Process ◽

Parallel Programs ◽

Divide And Conquer ◽

Parallel Applications ◽

Communication Structure ◽

Data Objects ◽

High Degree

This paper demonstrates how parallel programs with message-passing can be generated from abstract specifications embedded in the functional language MetaOCaml. The functional style permits to design parallel programs with a high degree of parameterization, so-called skeletons. Programmers who are unexperienced in parallelism can take such skeletons for a simple and safe generation of parallel applications. Since MetaOCaml also has efficient imperative features and an MPI interface, the entire program can be written in one language, without the need to use a language interface restricting the set of data objects which could be exchanged. The semantics of abstract specifications is expressed by an interpreter written in MetaOCaml. A cost model is defined by abstract interpretation of the specification. Partial evaluation of the interpreter with a specification, a feature which MetaOCaml provides, yields a parallel program. The partial evaluation process takes time on each MPI process directly before the execution of the application program, exploiting knowledge of the number of processes, the current process identifier and the communication structure. Our example is the specification of a divide-and-conquer skeleton which is used to compute the multiplication of multi-digit numbers using Karatsuba's algorithm.

Download Full-text

An environment for modeling and simulation of message-passing parallel applications for cloud computing

Software Practice and Experience ◽

10.1002/spe.2156 ◽

2012 ◽

Vol 43 (11) ◽

pp. 1359-1375 ◽

Cited By ~ 7

Author(s):

Saurabh Kumar Garg ◽

Rajkumar Buyya

Keyword(s):

Cloud Computing ◽

Modeling And Simulation ◽

Message Passing ◽

Parallel Applications

Download Full-text

Employing MPI_T in MPI Advisor to optimize application performance

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016684005 ◽

2017 ◽

Vol 32 (6) ◽

pp. 882-896 ◽

Cited By ~ 1

Author(s):

Esthela Gallardo ◽

Jérôme Vienne ◽

Leonardo Fialho ◽

Patricia Teller ◽

James Browne

Keyword(s):

Performance Optimization ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Expert Knowledge ◽

Parallel Applications ◽

Communication Behaviors ◽

Application Performance ◽

Impact Performance ◽

Runtime Environment

MPI_T, the MPI Tool Information Interface, was introduced in the MPI 3.0 standard with the aim of enabling the development of more effective tools to support the Message Passing Interface (MPI), a standardized and portable message-passing system that is widely used in parallel programs. Most MPI optimization tools do not yet employ MPI_T and only describe the interactions between an application and an MPI library, thus requiring that users have expert knowledge to translate this information into optimizations. In contrast, MPI Advisor, a recently developed, easy-to-use methodology and tool for MPI performance optimization, pioneered the use of information provided by MPI_T to characterize the communication behaviors of an application and identify an MPI configuration that may enhance application performance. In addition to enabling the recommendation of performance optimizations, MPI_T has the potential to enable automatic runtime application of these optimizations. Optimization of MPI configurations is important because: (1) the vast majority of parallel applications executed on high-performance computing clusters use MPI for communication among processes, (2) most users execute their programs using the cluster’s default MPI configuration, and (3) while default configurations may give adequate performance, it is well known that optimizing the MPI runtime environment can significantly improve application performance, in particular, when the way in which the application is executed and/or the application’s input changes. This paper provides an overview of MPI_T, describes how it can be used to develop more effective MPI optimization tools, and demonstrates its use within an extended version of MPI Advisor. In doing the latter, it presents several MPI configuration choices that can significantly impact performance, shows how use of information collected at runtime with MPI_T and PMPI can be used to enhance performance, and presents MPI Advisor case studies of these configuration optimizations with performance gains of up to 40%.

Download Full-text

EXECUTION OF SEQUENTIAL AND PARALLEL JAVA BYTECODE IN A METACOMPUTING SYSTEM

Parallel Processing Letters ◽

10.1142/s0129626403001148 ◽

2003 ◽

Vol 13 (01) ◽

pp. 53-64 ◽

Cited By ~ 1

Author(s):

ERIC GAMESS

Keyword(s):

Linear Algebra ◽

Virtual Machine ◽

Message Passing ◽

High Performance ◽

Scientific Computing ◽

Message Passing Interface ◽

Java Virtual Machine ◽

Parallel Applications ◽

Beowulf Cluster ◽

Java Bytecode

In this paper, we address the goal of executing Java parallel applications in a group of nodes of a Beowulf cluster transparently chosen by a metacomputing system oriented to efficient execution of Java bytecode, with support for scientific computing. To this end, we extend the Java virtual machine by providing a message passing interface and quick access to distributed high performance resources. Also, we introduce the execution of parallel linear algebra methods for large objects from sequential Java applications by invoking SPLAM, our parallel linear algebra package.

Download Full-text

SMCV: a Methodology for Detecting Transient Faults in Multicore Clusters

CLEI electronic journal ◽

10.19153/cleiej.15.3.5 ◽

2012 ◽

Vol 15 (3) ◽

Cited By ~ 1

Author(s):

Diego Montezanti ◽

Fernando Emmanuel Frati ◽

Dolores Rexachs ◽

Emilio Luque ◽

Marcelo Naiouf ◽

...

Keyword(s):

Fault Detection ◽

Message Passing ◽

Parallel Applications ◽

Transient Faults ◽

Trade Off ◽

Detection Latency ◽

Hardware Redundancy ◽

Multicore Clusters

The challenge of improving the performance of current processors is achieved by increasing the integration scale. This carries a growing vulnerability to transient faults, which increase their impact on multicore clusters running large scientific parallel applications. The requirement for enhancing the reliability of these systems, coupled with the high cost of rerunning the application from the beginning, create the motivation for having specific software strategies for the target systems. This paper introduces SMCV, which is a fully distributed technique that provides fault detection for message-passing parallel applications, by validating the contents of the messages to be sent, preventing the transmission of errors to other processes and leveraging the intrinsic hardware redundancy of the multicore. SMCV achieves a wide robustness against transient faults with a reduced overhead, and accomplishes a trade-off between moderate detection latency and low additional workload.

Download Full-text

On the coexistence of shared-memory and message-passing in the programming of parallel applications

High-Performance Computing and Networking - Lecture Notes in Computer Science ◽

10.1007/bfb0031643 ◽

1997 ◽

pp. 718-727 ◽

Cited By ~ 4

Author(s):

J. Cordsen ◽

W. Schröder-Preikschat

Keyword(s):

Shared Memory ◽

Message Passing ◽

Parallel Applications

Download Full-text

Real Time HPC Architecture for Engineering Applications

Volume 3: 30th Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2010-28944 ◽

2010 ◽

Author(s):

Dawn Nelson ◽

Scott Spetka

Keyword(s):

Real Time ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Main Memory ◽

Parallel Applications ◽

Test Bed ◽

Engineering Applications ◽

Real Time Scheduling ◽

Time Scheduling

The need to increase the performance of real time systems is growing along with system complexity. High performance computers (HPCs) with real-time scheduling support can be used to control and improve the performance of real time engineering applications. The latency that develops when parallel programs finish at dissimilar times is referred to as jitter. Jitter and latency can develop due to interference by other processes, interrupt handlers, or the Linux operating system. Experiments that used the Real Time Application Interface (RTAI) in conjunction with the Message Passing Interface (MPI) to implement parallel applications, reduced or eliminated jitter for experimental codes that have characteristics typical of engineering applications. The experimental HPC test bed is a Linux cluster with nine Intel Pentium IV, 3.4 GHz computers, connected by 100 Mb Ethernet using a switch. Each Linux system has 1 GB main memory, and is running Linux release 2.6.23 patched with RTAI 3.6.

Download Full-text

EFFICIENT SUPPORT FOR SKELETONS ON WORKSTATION CLUSTERS

Parallel Processing Letters ◽

10.1142/s0129626401000415 ◽

2001 ◽

Vol 11 (01) ◽

pp. 41-56 ◽

Cited By ~ 21

Author(s):

MARCO DANELUTTO

Keyword(s):

Parallel Programming ◽

Message Passing ◽

Low Cost ◽

Implementation Strategies ◽

Parallel Applications ◽

Programming Models ◽

Workstation Clusters ◽

Programming Environments ◽

Implementation Techniques ◽

Reasonable Cost

Beowulf class clusters are gaining more and more interest as low cost parallel architectures. They deliver reasonable performance at a very reasonable cost, compared to classical MPP machines. Parallel applications are usually developed on clusters using MPI/PVM message passing or HPF programming environments. Here we discuss new implementation strategies to support structured parallel programming environments for clusters based on skeletons. The adoption of structured parallel programming models greatly reduces the time spent in developing new parallel applications on clusters. The adoption of our implementation techniques based on macro data flow allows very efficient parallel applications to be developed on clusters. We discuss experiments that demonstrate the full feasibility of the approach.

Download Full-text