A Message-Passing Hardware/Software Cosimulation Environment for Reconfigurable Computing Systems

International Journal of Reconfigurable Computing ◽

10.1155/2009/376232 ◽

2009 ◽

Vol 2009 ◽

pp. 1-9

Author(s):

Manuel Saldaña ◽

Emanuel Ramalho ◽

Paul Chow

Keyword(s):

Reconfigurable Computing ◽

Message Passing ◽

High Performance ◽

System Level ◽

Application Development ◽

Reconfigurable Computers ◽

Development Tool ◽

Verification Tools ◽

Linpack Benchmark ◽

Xilinx Fpga

High-performance reconfigurable computers (HPRCs) provide a mix of standard processors and FPGAs to collectively accelerate applications. This introduces new design challenges, such as the need for portable programming models across HPRCs and system-level verification tools. To address the need for cosimulating a complete heterogeneous application using both software and hardware in an HPRC, we have created a tool called the Message-passing Simulation Framework (MSF). We have used it to simulate and develop an interface enabling an MPI-based approach to exchange data between X86 processors and hardware engines inside FPGAs. The MSF can also be used as an application development tool that enables multiple FPGAs in simulation to exchange messages amongst themselves and with X86 processors. As an example, we simulate a LINPACK benchmark hardware core using an Intel-FSB-Xilinx-FPGA platform to quickly prototype the hardware, to test the communications. and to verify the benchmark results.

Download Full-text

SDNOFS: Software Defined Networking with Openflow Switches & BCN-ECN with ALTQ for Congestion Avoidance

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8873.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 3710-3719

Keyword(s):

High Performance Computing ◽

Message Passing ◽

High Performance ◽

Congestion Management ◽

Congestion Avoidance ◽

Software Defined Networks ◽

Application Development ◽

Open Flow ◽

Business Requirements ◽

Performance Computing

High-performance computing cluster in a cloud environment. High-performance computing (HPC) helps scientists and researchers to solve complex problems involving multiple computational capabilities. The main reason for using a message passing model is to promote application development, porting, and execution on the variety of parallel computers that can support the paradigm. Since congestion avoidance is critical for the efficient use of different applications, an efficient method for congestion management in software-defined networks based on Open Flow protocol has been presented. This paper proposed two methods; initially, to avoid the congestion problem used by Software Defined Networks (SDN) with open flow switches, this method was originally defined as a communication protocol in SDN environments which allows the SDN controller to interact directly with the forwarding plane of network devices such as switches and routers, both physical and virtual (hypervisorbased), so that it could better adapt to changing business requirements.. Second, to enhance the quality of service and avoid the congestion problem used BCN-ECN with ALTQ. While comparing the existing method, the SDN open flow switches and BCN-ECN with ALTQ provides 98 % accuracy. Usage of these proposed methods will enhance the parameters structures delay time, level of congestion quality time and execution time

Download Full-text

ADVANCED FEATURES OF NVIDIA KEPLER ARCHITECTURE AND PARALLEL COMPUTATION PLATFORM CUDA FOR DEVELOPING SCIENTIFIC COMPUTE-INTENSIVE APPLICATIONS

10.46813/2019-121-105 ◽

2019 ◽

pp. 105-108

Author(s):

V.A. Dudnik ◽

V.I. Kudryavtsev ◽

S.A. Us ◽

M.V. Shestakov

Keyword(s):

Parallel Computation ◽

Program Development ◽

High Performance ◽

Application Development ◽

Time Of Application ◽

Development Tool ◽

Wide Range ◽

Computation Algorithms

The paper describes additional features oﬀered by new Kepler architecture of NVIDIA graphic processors, and their usage for creating high performance programs in a wide range of scientiﬁc compute-intensive applications. Recommendations are given for their use at realization of sci-tech computation algorithms by means of graphic processors. New capabilities of the parallel computation platform CUDA are also described, in particular, regarding a set of program development tool extensions for the Fortran, C and C++ languages. The extended capabilities make it possible to minimize the time of application development and to increase the programming productivity.

Download Full-text

SAGE: an application development tool suite for high performance computing systems

2000 IEEE Aerospace Conference. Proceedings (Cat. No.00TH8484) ◽

10.1109/aero.2000.879435 ◽

2002 ◽

Cited By ~ 1

Author(s):

M.I. Patel ◽

K.L. Jordan

Keyword(s):

High Performance Computing ◽

High Performance ◽

Application Development ◽

Computing Systems ◽

Development Tool ◽

Performance Computing

Download Full-text

A Universal, Dynamically Adaptable and Programmable Network Router for Parallel Computers

VLSI Design ◽

10.1155/2001/50167 ◽

2001 ◽

Vol 12 (1) ◽

pp. 25-52 ◽

Cited By ~ 4

Author(s):

Taras I. Golota ◽

Sotirios G. Ziavras

Keyword(s):

Message Passing ◽

High Performance ◽

Interconnection Network ◽

Vlsi Design ◽

Parallel Computers ◽

System Level ◽

Current Configuration ◽

Data Channel ◽

Vhdl Code ◽

Or Applications

Existing message-passing parallel computers employ routers designed for a specific interconnection network and deal with fixed data channel width. There are disadvantages to this approach, because the system design and development times are significant and these routers do not permit run time network reconfiguration. Changes in the topology of the network may be required for better performance or faulttolerance. In this paper, we introduce a class of high-performance universal (statically and dynamically adaptable) programmable routers (UPRs) for message-passing parallel computers. The universality of these routers is based on their capability to adapt at run and/or static times according to the characteristics of the systems and/or applications. More specifically, the number of bidirectional data channels, the channel size and the I/O port mappings (for the implementation of a particular topology) can change dynamically and statically. Our research focuses on system-level specification issues of the UPRs, their VLSI design and their simulation to estimate their performance. Our simulation of data transfers via UPR routers employs VHDL code in the Mentor Graphics environment. The results show that the performance of the routers depends mostly on their current configuration. Details of the simulation and synthesis are presented.

Download Full-text

Performance Measurement and Analysis of Large-Scale Parallel Applications on Leadership Computing Systems

Scientific Programming ◽

10.1155/2008/632685 ◽

2008 ◽

Vol 16 (2-3) ◽

pp. 167-181 ◽

Cited By ~ 11

Author(s):

Brian J.N. Wylie ◽

Markus Geimer ◽

Felix Wolf

Keyword(s):

Performance Measurement ◽

Message Passing ◽

High Performance ◽

Large Scale ◽

Weather Prediction ◽

Parallel Applications ◽

Application Development ◽

Measurement And Analysis ◽

High Performance Systems ◽

Blue Gene

Developers of applications with large-scale computing requirements are currently presented with a variety of high-performance systems optimised for message-passing, however, effectively exploiting the available computing resources remains a major challenge. In addition to fundamental application scalability characteristics, application and system peculiarities often only manifest at extreme scales, requiring highly scalable performance measurement and analysis tools that are convenient to incorporate in application development and tuning activities. We present our experiences with a multigrid solver benchmark and state-of-the-art real-world applications for numerical weather prediction and computational fluid dynamics, on three quite different multi-thousand-processor supercomputer systems – Cray XT3/4, MareNostrum & Blue Gene/L – using the newly-developed SCALASCA toolset to quantify and isolate a range of significant performance issues.

Download Full-text

Multi-level Parallelization of Genotype Imputation on Supercomputers

Current Bioinformatics ◽

10.2174/1574893615999200420071307 ◽

2020 ◽

Vol 15 ◽

Author(s):

Weiwen Zhang ◽

Long Wang ◽

Theint Theint Aye ◽

Juniarto Samsudin ◽

Yongqing Zhu

Keyword(s):

Association Study ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Genome Wide Association Study ◽

Job Scheduling ◽

Genotype Imputation ◽

Job Level ◽

Multi Level ◽

High Performance Requirement

Background: Genotype imputation as a service is developed to enable researchers to estimate genotypes on haplotyped data without performing whole genome sequencing. However, genotype imputation is computation intensive and thus it remains a challenge to satisfy the high performance requirement of genome wide association study (GWAS). Objective: In this paper, we propose a high performance computing solution for genotype imputation on supercomputers to enhance its execution performance. Method: We design and implement a multi-level parallelization that includes job level, process level and thread level parallelization, enabled by job scheduling management, message passing interface (MPI) and OpenMP, respectively. It involves job distribution, chunk partition and execution, parallelized iteration for imputation and data concatenation. Due to the design of multi-level parallelization, we can exploit the multi-machine/multi-core architecture to improve the performance of genotype imputation. Results: Experiment results show that our proposed method can outperform the Hadoop-based implementation of genotype imputation. Moreover, we conduct the experiments on supercomputers to evaluate the performance of the proposed method. The evaluation shows that it can significantly shorten the execution time, thus improving the performance for genotype imputation. Conclusion: The proposed multi-level parallelization, when deployed as an imputation as a service, will facilitate bioinformatics researchers in Singapore to conduct genotype imputation and enhance the association study.

Download Full-text

Reconfigurable computing architectures for high performance analysis (abstract only)

IET 3rd International Conference MEDSIP 2006. Advances in Medical, Signal and Information Processing ◽

10.1049/cp:20060401 ◽

2006 ◽

Author(s):

M. Devlin

Keyword(s):

Performance Analysis ◽

Reconfigurable Computing ◽

High Performance

Download Full-text

A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing ◽

10.1016/0167-8191(96)00024-5 ◽

1996 ◽

Vol 22 (6) ◽

pp. 789-828 ◽

Cited By ~ 1155

Author(s):

William Gropp ◽

Ewing Lusk ◽

Nathan Doss ◽

Anthony Skjellum

Keyword(s):

Message Passing ◽

High Performance ◽

Message Passing Interface

Download Full-text

Design and evaluation of a high performance file system for message passing parallel computers

[1991] Proceedings. The Fifth International Parallel Processing Symposium ◽

10.1109/ipps.1991.153835 ◽

2002 ◽

Cited By ~ 1

Author(s):

U. Nagaraj ◽

U.S. Shukla ◽

A. Paulraj

Keyword(s):

Message Passing ◽

High Performance ◽

File System ◽

Parallel Computers

Download Full-text

Based on Numerical Simulation of High-Performance Parallel Machine Muffler Experimental Calibration

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.718-720.1645 ◽

2013 ◽

Vol 718-720 ◽

pp. 1645-1650

Author(s):

Gen Yin Cheng ◽

Sheng Chen Yu ◽

Zhi Yong Wei ◽

Shao Jie Chen ◽

You Cheng

Keyword(s):

Numerical Simulation ◽

Finite Element ◽

Boundary Element ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Parallel Machine ◽

Simulation Software ◽

Experimental Calibration ◽

The Cost

Commonly used commercial simulation software SYSNOISE and ANSYS is run on a single machine (can not directly run on parallel machine) when use the finite element and boundary element to simulate muffler effect, and it will take more than ten days, sometimes even twenty days to work out an exact solution as the large amount of numerical simulation. Use a high performance parallel machine which was built by 32 commercial computers and transform the finite element and boundary element simulation software into a program that can running under the MPI (message passing interface) parallel environment in order to reduce the cost of numerical simulation. The relevant data worked out from the simulation experiment demonstrate that the result effect of the numerical simulation is well. And the computing speed of the high performance parallel machine is 25 ~ 30 times a microcomputer.

Download Full-text