MPVisualizer: A general tool to debug message passing parallel applications

Author(s):  
Ana Paula Cláudio ◽  
João Duarte Cunha ◽  
Maria Beatriz Carmo
1999 ◽  
Vol 103 (1027) ◽  
pp. 443-447 ◽  
Author(s):  
W. McMillan ◽  
M. Woodgate ◽  
B. E. Richards ◽  
B. J. Gribben ◽  
K. J. Badcock ◽  
...  

Abstract Motivated by a lack of sufficient local and national computing facilities for computational fluid dynamics simulations, the Affordable Systems Computing Unit (ASCU) was established to investigate low cost alternatives. The options considered have all involved cluster computing, a term which refers to the grouping of a number of components into a managed system capable of running both serial and parallel applications. The present work aims to demonstrate the utility of commodity processors for dedicated batch processing. The performance of the cluster has proved to be extremely cost effective, enabling large three dimensional flow simulations on a computer costing less than £25k sterling at current market prices. The experience gained on this system in terms of single node performance, message passing and parallel performance will be discussed. In particular, comparisons with the performance of other systems will be made. Several medium-large scale CFD simulations performed using the new cluster will be presented to demonstrate the potential of commodity processor based parallel computers for aerodynamic simulation.


2005 ◽  
Vol 15 (03) ◽  
pp. 305-320 ◽  
Author(s):  
CHRISTOPH A. HERRMANN

This paper demonstrates how parallel programs with message-passing can be generated from abstract specifications embedded in the functional language MetaOCaml. The functional style permits to design parallel programs with a high degree of parameterization, so-called skeletons. Programmers who are unexperienced in parallelism can take such skeletons for a simple and safe generation of parallel applications. Since MetaOCaml also has efficient imperative features and an MPI interface, the entire program can be written in one language, without the need to use a language interface restricting the set of data objects which could be exchanged. The semantics of abstract specifications is expressed by an interpreter written in MetaOCaml. A cost model is defined by abstract interpretation of the specification. Partial evaluation of the interpreter with a specification, a feature which MetaOCaml provides, yields a parallel program. The partial evaluation process takes time on each MPI process directly before the execution of the application program, exploiting knowledge of the number of processes, the current process identifier and the communication structure. Our example is the specification of a divide-and-conquer skeleton which is used to compute the multiplication of multi-digit numbers using Karatsuba's algorithm.


Author(s):  
Esthela Gallardo ◽  
Jérôme Vienne ◽  
Leonardo Fialho ◽  
Patricia Teller ◽  
James Browne

MPI_T, the MPI Tool Information Interface, was introduced in the MPI 3.0 standard with the aim of enabling the development of more effective tools to support the Message Passing Interface (MPI), a standardized and portable message-passing system that is widely used in parallel programs. Most MPI optimization tools do not yet employ MPI_T and only describe the interactions between an application and an MPI library, thus requiring that users have expert knowledge to translate this information into optimizations. In contrast, MPI Advisor, a recently developed, easy-to-use methodology and tool for MPI performance optimization, pioneered the use of information provided by MPI_T to characterize the communication behaviors of an application and identify an MPI configuration that may enhance application performance. In addition to enabling the recommendation of performance optimizations, MPI_T has the potential to enable automatic runtime application of these optimizations. Optimization of MPI configurations is important because: (1) the vast majority of parallel applications executed on high-performance computing clusters use MPI for communication among processes, (2) most users execute their programs using the cluster’s default MPI configuration, and (3) while default configurations may give adequate performance, it is well known that optimizing the MPI runtime environment can significantly improve application performance, in particular, when the way in which the application is executed and/or the application’s input changes. This paper provides an overview of MPI_T, describes how it can be used to develop more effective MPI optimization tools, and demonstrates its use within an extended version of MPI Advisor. In doing the latter, it presents several MPI configuration choices that can significantly impact performance, shows how use of information collected at runtime with MPI_T and PMPI can be used to enhance performance, and presents MPI Advisor case studies of these configuration optimizations with performance gains of up to 40%.


2003 ◽  
Vol 13 (01) ◽  
pp. 53-64 ◽  
Author(s):  
ERIC GAMESS

In this paper, we address the goal of executing Java parallel applications in a group of nodes of a Beowulf cluster transparently chosen by a metacomputing system oriented to efficient execution of Java bytecode, with support for scientific computing. To this end, we extend the Java virtual machine by providing a message passing interface and quick access to distributed high performance resources. Also, we introduce the execution of parallel linear algebra methods for large objects from sequential Java applications by invoking SPLAM, our parallel linear algebra package.


2012 ◽  
Vol 15 (3) ◽  
Author(s):  
Diego Montezanti ◽  
Fernando Emmanuel Frati ◽  
Dolores Rexachs ◽  
Emilio Luque ◽  
Marcelo Naiouf ◽  
...  

The challenge of improving the performance of current processors is achieved by increasing the integration scale. This carries a growing vulnerability to transient faults, which increase their impact on multicore clusters running large scientific parallel applications. The requirement for enhancing the reliability of these systems, coupled with the high cost of rerunning the application from the beginning, create the motivation for having specific software strategies for the target systems. This paper introduces SMCV, which is a fully distributed technique that provides fault detection for message-passing parallel applications, by validating the contents of the messages to be sent, preventing the transmission of errors to other processes and leveraging the intrinsic hardware redundancy of the multicore. SMCV achieves a wide robustness against transient faults with a reduced overhead, and accomplishes a trade-off between moderate detection latency and low additional workload.


Author(s):  
Dawn Nelson ◽  
Scott Spetka

The need to increase the performance of real time systems is growing along with system complexity. High performance computers (HPCs) with real-time scheduling support can be used to control and improve the performance of real time engineering applications. The latency that develops when parallel programs finish at dissimilar times is referred to as jitter. Jitter and latency can develop due to interference by other processes, interrupt handlers, or the Linux operating system. Experiments that used the Real Time Application Interface (RTAI) in conjunction with the Message Passing Interface (MPI) to implement parallel applications, reduced or eliminated jitter for experimental codes that have characteristics typical of engineering applications. The experimental HPC test bed is a Linux cluster with nine Intel Pentium IV, 3.4 GHz computers, connected by 100 Mb Ethernet using a switch. Each Linux system has 1 GB main memory, and is running Linux release 2.6.23 patched with RTAI 3.6.


2001 ◽  
Vol 11 (01) ◽  
pp. 41-56 ◽  
Author(s):  
MARCO DANELUTTO

Beowulf class clusters are gaining more and more interest as low cost parallel architectures. They deliver reasonable performance at a very reasonable cost, compared to classical MPP machines. Parallel applications are usually developed on clusters using MPI/PVM message passing or HPF programming environments. Here we discuss new implementation strategies to support structured parallel programming environments for clusters based on skeletons. The adoption of structured parallel programming models greatly reduces the time spent in developing new parallel applications on clusters. The adoption of our implementation techniques based on macro data flow allows very efficient parallel applications to be developed on clusters. We discuss experiments that demonstrate the full feasibility of the approach.


Sign in / Sign up

Export Citation Format

Share Document