scholarly journals A Message-Passing Hardware/Software Cosimulation Environment for Reconfigurable Computing Systems

2009 ◽  
Vol 2009 ◽  
pp. 1-9
Author(s):  
Manuel Saldaña ◽  
Emanuel Ramalho ◽  
Paul Chow

High-performance reconfigurable computers (HPRCs) provide a mix of standard processors and FPGAs to collectively accelerate applications. This introduces new design challenges, such as the need for portable programming models across HPRCs and system-level verification tools. To address the need for cosimulating a complete heterogeneous application using both software and hardware in an HPRC, we have created a tool called the Message-passing Simulation Framework (MSF). We have used it to simulate and develop an interface enabling an MPI-based approach to exchange data between X86 processors and hardware engines inside FPGAs. The MSF can also be used as an application development tool that enables multiple FPGAs in simulation to exchange messages amongst themselves and with X86 processors. As an example, we simulate a LINPACK benchmark hardware core using an Intel-FSB-Xilinx-FPGA platform to quickly prototype the hardware, to test the communications. and to verify the benchmark results.

2020 ◽  
Vol 8 (5) ◽  
pp. 3710-3719

High-performance computing cluster in a cloud environment. High-performance computing (HPC) helps scientists and researchers to solve complex problems involving multiple computational capabilities. The main reason for using a message passing model is to promote application development, porting, and execution on the variety of parallel computers that can support the paradigm. Since congestion avoidance is critical for the efficient use of different applications, an efficient method for congestion management in software-defined networks based on Open Flow protocol has been presented. This paper proposed two methods; initially, to avoid the congestion problem used by Software Defined Networks (SDN) with open flow switches, this method was originally defined as a communication protocol in SDN environments which allows the SDN controller to interact directly with the forwarding plane of network devices such as switches and routers, both physical and virtual (hypervisorbased), so that it could better adapt to changing business requirements.. Second, to enhance the quality of service and avoid the congestion problem used BCN-ECN with ALTQ. While comparing the existing method, the SDN open flow switches and BCN-ECN with ALTQ provides 98 % accuracy. Usage of these proposed methods will enhance the parameters structures delay time, level of congestion quality time and execution time


2019 ◽  
pp. 105-108
Author(s):  
V.A. Dudnik ◽  
V.I. Kudryavtsev ◽  
S.A. Us ◽  
M.V. Shestakov

The paper describes additional features offered by new Kepler architecture of NVIDIA graphic processors, and their usage for creating high performance programs in a wide range of scientific compute-intensive applications. Recommendations are given for their use at realization of sci-tech computation algorithms by means of graphic processors. New capabilities of the parallel computation platform CUDA are also described, in particular, regarding a set of program development tool extensions for the Fortran, C and C++ languages. The extended capabilities make it possible to minimize the time of application development and to increase the programming productivity.


VLSI Design ◽  
2001 ◽  
Vol 12 (1) ◽  
pp. 25-52 ◽  
Author(s):  
Taras I. Golota ◽  
Sotirios G. Ziavras

Existing message-passing parallel computers employ routers designed for a specific interconnection network and deal with fixed data channel width. There are disadvantages to this approach, because the system design and development times are significant and these routers do not permit run time network reconfiguration. Changes in the topology of the network may be required for better performance or faulttolerance. In this paper, we introduce a class of high-performance universal (statically and dynamically adaptable) programmable routers (UPRs) for message-passing parallel computers. The universality of these routers is based on their capability to adapt at run and/or static times according to the characteristics of the systems and/or applications. More specifically, the number of bidirectional data channels, the channel size and the I/O port mappings (for the implementation of a particular topology) can change dynamically and statically. Our research focuses on system-level specification issues of the UPRs, their VLSI design and their simulation to estimate their performance. Our simulation of data transfers via UPR routers employs VHDL code in the Mentor Graphics environment. The results show that the performance of the routers depends mostly on their current configuration. Details of the simulation and synthesis are presented.


2008 ◽  
Vol 16 (2-3) ◽  
pp. 167-181 ◽  
Author(s):  
Brian J.N. Wylie ◽  
Markus Geimer ◽  
Felix Wolf

Developers of applications with large-scale computing requirements are currently presented with a variety of high-performance systems optimised for message-passing, however, effectively exploiting the available computing resources remains a major challenge. In addition to fundamental application scalability characteristics, application and system peculiarities often only manifest at extreme scales, requiring highly scalable performance measurement and analysis tools that are convenient to incorporate in application development and tuning activities. We present our experiences with a multigrid solver benchmark and state-of-the-art real-world applications for numerical weather prediction and computational fluid dynamics, on three quite different multi-thousand-processor supercomputer systems – Cray XT3/4, MareNostrum & Blue Gene/L – using the newly-developed SCALASCA toolset to quantify and isolate a range of significant performance issues.


2020 ◽  
Vol 15 ◽  
Author(s):  
Weiwen Zhang ◽  
Long Wang ◽  
Theint Theint Aye ◽  
Juniarto Samsudin ◽  
Yongqing Zhu

Background: Genotype imputation as a service is developed to enable researchers to estimate genotypes on haplotyped data without performing whole genome sequencing. However, genotype imputation is computation intensive and thus it remains a challenge to satisfy the high performance requirement of genome wide association study (GWAS). Objective: In this paper, we propose a high performance computing solution for genotype imputation on supercomputers to enhance its execution performance. Method: We design and implement a multi-level parallelization that includes job level, process level and thread level parallelization, enabled by job scheduling management, message passing interface (MPI) and OpenMP, respectively. It involves job distribution, chunk partition and execution, parallelized iteration for imputation and data concatenation. Due to the design of multi-level parallelization, we can exploit the multi-machine/multi-core architecture to improve the performance of genotype imputation. Results: Experiment results show that our proposed method can outperform the Hadoop-based implementation of genotype imputation. Moreover, we conduct the experiments on supercomputers to evaluate the performance of the proposed method. The evaluation shows that it can significantly shorten the execution time, thus improving the performance for genotype imputation. Conclusion: The proposed multi-level parallelization, when deployed as an imputation as a service, will facilitate bioinformatics researchers in Singapore to conduct genotype imputation and enhance the association study.


1996 ◽  
Vol 22 (6) ◽  
pp. 789-828 ◽  
Author(s):  
William Gropp ◽  
Ewing Lusk ◽  
Nathan Doss ◽  
Anthony Skjellum

2013 ◽  
Vol 718-720 ◽  
pp. 1645-1650
Author(s):  
Gen Yin Cheng ◽  
Sheng Chen Yu ◽  
Zhi Yong Wei ◽  
Shao Jie Chen ◽  
You Cheng

Commonly used commercial simulation software SYSNOISE and ANSYS is run on a single machine (can not directly run on parallel machine) when use the finite element and boundary element to simulate muffler effect, and it will take more than ten days, sometimes even twenty days to work out an exact solution as the large amount of numerical simulation. Use a high performance parallel machine which was built by 32 commercial computers and transform the finite element and boundary element simulation software into a program that can running under the MPI (message passing interface) parallel environment in order to reduce the cost of numerical simulation. The relevant data worked out from the simulation experiment demonstrate that the result effect of the numerical simulation is well. And the computing speed of the high performance parallel machine is 25 ~ 30 times a microcomputer.


Sign in / Sign up

Export Citation Format

Share Document