Real Time HPC Architecture for Engineering Applications

Volume 3: 30th Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2010-28944 ◽

2010 ◽

Author(s):

Dawn Nelson ◽

Scott Spetka

Keyword(s):

Real Time ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Main Memory ◽

Parallel Applications ◽

Test Bed ◽

Engineering Applications ◽

Real Time Scheduling ◽

Time Scheduling

The need to increase the performance of real time systems is growing along with system complexity. High performance computers (HPCs) with real-time scheduling support can be used to control and improve the performance of real time engineering applications. The latency that develops when parallel programs finish at dissimilar times is referred to as jitter. Jitter and latency can develop due to interference by other processes, interrupt handlers, or the Linux operating system. Experiments that used the Real Time Application Interface (RTAI) in conjunction with the Message Passing Interface (MPI) to implement parallel applications, reduced or eliminated jitter for experimental codes that have characteristics typical of engineering applications. The experimental HPC test bed is a Linux cluster with nine Intel Pentium IV, 3.4 GHz computers, connected by 100 Mb Ethernet using a switch. Each Linux system has 1 GB main memory, and is running Linux release 2.6.23 patched with RTAI 3.6.

Download Full-text

Angara interconnect makes GPU-based Desmos supercomputer an efficient tool for molecular dynamics calculations

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019826667 ◽

2019 ◽

Vol 33 (3) ◽

pp. 507-521 ◽

Cited By ~ 10

Author(s):

Vladimir Stegailov ◽

Ekaterina Dlinnova ◽

Timur Ismagilov ◽

Mikhail Khalilov ◽

Nikolay Kondratyuk ◽

...

Keyword(s):

Molecular Dynamics ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Job Scheduling ◽

Cost Effective ◽

Test Bed ◽

Molecular Dynamics Calculations ◽

High Bandwidth ◽

Network Topologies

In this article, we describe the Desmos supercomputer that consists of 32 hybrid nodes connected by a low-latency high-bandwidth Angara interconnect with torus topology. This supercomputer is aimed at cost-effective classical molecular dynamics calculations. Desmos serves as a test bed for the Angara interconnect that supports 3-D and 4-D torus network topologies and verifies its ability to unite massively parallel programming systems speeding-up effectively message-passing interface (MPI)-based applications. We describe the Angara interconnect presenting typical MPI benchmarks. Desmos benchmarks results for GROMACS, LAMMPS, VASP and CP2K are compared with the data for other high-performance computing (HPC) systems. Also, we consider the job scheduling statistics for several months of Desmos deployment.

Download Full-text

High performance real-time scheduling of multiple mixed-criticality functions in heterogeneous distributed embedded systems

Journal of Systems Architecture ◽

10.1016/j.sysarc.2016.04.008 ◽

2016 ◽

Vol 70 ◽

pp. 3-14 ◽

Cited By ~ 35

Author(s):

Guoqi Xie ◽

Gang Zeng ◽

Liangjiao Liu ◽

Renfa Li ◽

Keqin Li

Keyword(s):

Embedded Systems ◽

Real Time ◽

High Performance ◽

Distributed Embedded Systems ◽

Real Time Scheduling ◽

Time Scheduling ◽

Mixed Criticality

Download Full-text

Employing MPI_T in MPI Advisor to optimize application performance

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016684005 ◽

2017 ◽

Vol 32 (6) ◽

pp. 882-896 ◽

Cited By ~ 1

Author(s):

Esthela Gallardo ◽

Jérôme Vienne ◽

Leonardo Fialho ◽

Patricia Teller ◽

James Browne

Keyword(s):

Performance Optimization ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Expert Knowledge ◽

Parallel Applications ◽

Communication Behaviors ◽

Application Performance ◽

Impact Performance ◽

Runtime Environment

MPI_T, the MPI Tool Information Interface, was introduced in the MPI 3.0 standard with the aim of enabling the development of more effective tools to support the Message Passing Interface (MPI), a standardized and portable message-passing system that is widely used in parallel programs. Most MPI optimization tools do not yet employ MPI_T and only describe the interactions between an application and an MPI library, thus requiring that users have expert knowledge to translate this information into optimizations. In contrast, MPI Advisor, a recently developed, easy-to-use methodology and tool for MPI performance optimization, pioneered the use of information provided by MPI_T to characterize the communication behaviors of an application and identify an MPI configuration that may enhance application performance. In addition to enabling the recommendation of performance optimizations, MPI_T has the potential to enable automatic runtime application of these optimizations. Optimization of MPI configurations is important because: (1) the vast majority of parallel applications executed on high-performance computing clusters use MPI for communication among processes, (2) most users execute their programs using the cluster’s default MPI configuration, and (3) while default configurations may give adequate performance, it is well known that optimizing the MPI runtime environment can significantly improve application performance, in particular, when the way in which the application is executed and/or the application’s input changes. This paper provides an overview of MPI_T, describes how it can be used to develop more effective MPI optimization tools, and demonstrates its use within an extended version of MPI Advisor. In doing the latter, it presents several MPI configuration choices that can significantly impact performance, shows how use of information collected at runtime with MPI_T and PMPI can be used to enhance performance, and presents MPI Advisor case studies of these configuration optimizations with performance gains of up to 40%.

Download Full-text

EXECUTION OF SEQUENTIAL AND PARALLEL JAVA BYTECODE IN A METACOMPUTING SYSTEM

Parallel Processing Letters ◽

10.1142/s0129626403001148 ◽

2003 ◽

Vol 13 (01) ◽

pp. 53-64 ◽

Cited By ~ 1

Author(s):

ERIC GAMESS

Keyword(s):

Linear Algebra ◽

Virtual Machine ◽

Message Passing ◽

High Performance ◽

Scientific Computing ◽

Message Passing Interface ◽

Java Virtual Machine ◽

Parallel Applications ◽

Beowulf Cluster ◽

Java Bytecode

In this paper, we address the goal of executing Java parallel applications in a group of nodes of a Beowulf cluster transparently chosen by a metacomputing system oriented to efficient execution of Java bytecode, with support for scientific computing. To this end, we extend the Java virtual machine by providing a message passing interface and quick access to distributed high performance resources. Also, we introduce the execution of parallel linear algebra methods for large objects from sequential Java applications by invoking SPLAM, our parallel linear algebra package.

Download Full-text