Resource Partitioning and Application Scheduling with Module Merging on Dynamically and Partially Reconfigurable FPGAs

Zhe Wang; Qi Tang; Biao Guo; Ji-Bo Wei; Ling Wang

doi:10.3390/electronics9091461

Resource Partitioning and Application Scheduling with Module Merging on Dynamically and Partially Reconfigurable FPGAs

Electronics ◽

10.3390/electronics9091461 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1461 ◽

Cited By ~ 1

Author(s):

Zhe Wang ◽

Qi Tang ◽

Biao Guo ◽

Ji-Bo Wei ◽

Ling Wang

Keyword(s):

High Performance ◽

System Model ◽

Mixed Integer ◽

Task Execution ◽

Processing Efficiency ◽

Actual Application ◽

Solution Generation ◽

Speed Up ◽

Application Requirements ◽

Performance Computing

Dynamically partially reconfigurable (DPR) technology based on FPGA is applied extensively in the field of high-performance computing (HPC) because of its advantages in processing efficiency and power consumption. To make full use of the advantages of DPR in execution efficiency, we build a DPR system model that meets to the actual application requirements and the objective constraints. According to the consistency of reconfiguration order and dependencies, we propose two algorithms based on simulated annealing (SA). The algorithms partition FPGA resource to several regions and schedule tasks to the regions. In order to improve the performance of the algorithms, we exploit the module merging technology to improve the parallelism of task execution and design a new solution generation method to speed up the convergence speed. Experimental results show that the proposed algorithms have a lower time complexity than mixed-integer linear programming (MILP), iterative scheduler (IS) and Ant Colony Optimization (ACO). For applications with more tasks, the proposed algorithms show performance advantages in producing better partitioning and scheduling results in a shorter time.

Download Full-text

Service for parallel applications based on JINR cloud and HybriLIT resources

EPJ Web of Conferences ◽

10.1051/epjconf/201921407012 ◽

2019 ◽

Vol 214 ◽

pp. 07012 ◽

Cited By ~ 1

Author(s):

Nikita Balashov ◽

Maxim Bashashin ◽

Pavel Goncharov ◽

Ruslan Kuchumov ◽

Nikolay Kutovskiy ◽

...

Keyword(s):

High Performance ◽

Cloud Service ◽

Parallel Applications ◽

Cloud Infrastructure ◽

Modular Architecture ◽

Practical Applications ◽

Speed Up ◽

Scientific Results ◽

Computational Resources ◽

Performance Computing

Cloud computing has become a routine tool for scientists in many fields. The JINR cloud infrastructure provides JINR users with computational resources to perform various scientific calculations. In order to speed up achievements of scientific results the JINR cloud service for parallel applications has been developed. It consists of several components and implements a flexible and modular architecture which allows to utilize both more applications and various types of resources as computational backends. An example of using the Cloud&HybriLIT resources in scientific computing is the study of superconducting processes in the stacked long Josephson junctions (LJJ). The LJJ systems have undergone intensive research because of the perspective of practical applications in nano-electronics and quantum computing. In this contribution we generalize the experience in application of the Cloud&HybriLIT resources for high performance computing of physical characteristics in the LJJ system.

Download Full-text

DISSIPATIVE PARTICLE DYNAMICS: INTRODUCTION, METHODOLOGY AND COMPLEX FLUID APPLICATIONS — A REVIEW

International Journal of Applied Mechanics ◽

10.1142/s1758825109000381 ◽

2009 ◽

Vol 01 (04) ◽

pp. 737-763 ◽

Cited By ~ 100

Author(s):

E. MOEENDARBARY ◽

T. Y. NG ◽

M. ZANGENEH

Keyword(s):

High Performance ◽

Dissipative Particle Dynamics ◽

Coarse Graining ◽

Particle Dynamics ◽

Hydrodynamic Behavior ◽

Computational Speed ◽

Speed Up ◽

Complex Fluid ◽

Performance Computing ◽

Dpd Simulation

The dissipative particle dynamics (DPD) technique is a relatively new mesoscale technique which was initially developed to simulate hydrodynamic behavior in mesoscopic complex fluids. It is essentially a particle technique in which molecules are clustered into the said particles, and this coarse graining is a very important aspect of the DPD as it allows significant computational speed-up. This increased computational efficiency, coupled with the recent advent of high performance computing, has subsequently enabled researchers to numerically study a host of complex fluid applications at a refined level. In this review, we trace the developments of various important aspects of the DPD methodology since it was first proposed in the in the early 1990's. In addition, we review notable published works which employed DPD simulation for complex fluid applications.

Download Full-text

Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009244 ◽

2021 ◽

Vol 17 (7) ◽

pp. e1009244

Author(s):

Maximilian Hanussek ◽

Felix Bartusch ◽

Jens Krüger

Keyword(s):

Machine Learning ◽

Virtual Environments ◽

High Performance ◽

Biological Data ◽

Scaling Behavior ◽

Bare Metal ◽

Learning Framework ◽

Speed Up ◽

Clustal Omega ◽

Performance Computing

The large amount of biological data available in the current times, makes it necessary to use tools and applications based on sophisticated and efficient algorithms, developed in the area of bioinformatics. Further, access to high performance computing resources is necessary, to achieve results in reasonable time. To speed up applications and utilize available compute resources as efficient as possible, software developers make use of parallelization mechanisms, like multithreading. Many of the available tools in bioinformatics offer multithreading capabilities, but more compute power is not always helpful. In this study we investigated the behavior of well-known applications in bioinformatics, regarding their performance in the terms of scaling, different virtual environments and different datasets with our benchmarking tool suite BOOTABLE. The tool suite includes the tools BBMap, Bowtie2, BWA, Velvet, IDBA, SPAdes, Clustal Omega, MAFFT, SINA and GROMACS. In addition we added an application using the machine learning framework TensorFlow. Machine learning is not directly part of bioinformatics but applied to many biological problems, especially in the context of medical images (X-ray photographs). The mentioned tools have been analyzed in two different virtual environments, a virtual machine environment based on the OpenStack cloud software and in a Docker environment. The gained performance values were compared to a bare-metal setup and among each other. The study reveals, that the used virtual environments produce an overhead in the range of seven to twenty-five percent compared to the bare-metal environment. The scaling measurements showed, that some of the analyzed tools do not benefit from using larger amounts of computing resources, whereas others showed an almost linear scaling behavior. The findings of this study have been generalized as far as possible and should help users to find the best amount of resources for their analysis. Further, the results provide valuable information for resource providers to handle their resources as efficiently as possible and raise the user community’s awareness of the efficient usage of computing resources.

Download Full-text

Scalability of DL_POLY on High Performance Computing Platform

South African Computer Journal ◽

10.18489/sacj.v29i3.405 ◽

2017 ◽

Vol 29 (3) ◽

Cited By ~ 1

Author(s):

Mabule Samuel Mabakane ◽

Daniel Mojalefa Moeketsi ◽

Anton Lopis

Keyword(s):

Molecular Dynamics ◽

High Performance Computing ◽

High Performance ◽

Computing Platform ◽

Speed Up ◽

Linux Cluster ◽

Large Systems ◽

Performance Computing ◽

Better Than

This paper presents a case study on the scalability of several versions of the molecular dynamics code (DL_POLY) performed on South Africa‘s Centre for High Performance Computing e1350 IBM Linux cluster, Sun system and Lengau supercomputers. Within this study different problem sizes were designed and the same chosen systems were employed in order to test the performance of DL_POLY using weak and strong scalability. It was found that the speed-up results for the small systems were better than large systems on both Ethernet and Infiniband network. However, simulations of large systems in DL_POLY performed well using Infiniband network on Lengau cluster as compared to e1350 and Sun supercomputer.

Download Full-text

A parallel workflow implementation for PEST version 13.6 in high-performance computing for WRF-Hydro version 5.0: a case study over the midwestern United States

Geoscientific Model Development ◽

10.5194/gmd-12-3523-2019 ◽

2019 ◽

Vol 12 (8) ◽

pp. 3523-3539 ◽

Cited By ~ 1

Author(s):

Jiali Wang ◽

Cheng Wang ◽

Vishwas Rao ◽

Andrew Orr ◽

Eugene Yan ◽

...

Keyword(s):

United States ◽

High Performance Computing ◽

High Performance ◽

Hydrological Cycle ◽

Midwestern United States ◽

Calibration Process ◽

Speed Up ◽

Parallel Workflow ◽

Performance Computing

Abstract. The Weather Research and Forecasting Hydrological (WRF-Hydro) system is a state-of-the-art numerical model that models the entire hydrological cycle based on physical principles. As with other hydrological models, WRF-Hydro parameterizes many physical processes. Hence, WRF-Hydro needs to be calibrated to optimize its output with respect to observations for the application region. When applied to a relatively large domain, both WRF-Hydro simulations and calibrations require intensive computing resources and are best performed on multimode, multicore high-performance computing (HPC) systems. Typically, each physics-based model requires a calibration process that works specifically with that model and is not transferrable to a different process or model. The parameter estimation tool (PEST) is a flexible and generic calibration tool that can be used in principle to calibrate any of these models. In its existing configuration, however, PEST is not designed to work on the current generation of massively parallel HPC clusters. To address this issue, we ported the parallel PEST to HPCs and adapted it to work with WRF-Hydro. The porting involved writing scripts to modify the workflow for different workload managers and job schedulers, as well as to connect the parallel PEST to WRF-Hydro. To test the operational feasibility and the computational benefits of this first-of-its-kind HPC-enabled parallel PEST, we developed a case study using a flood in the midwestern United States in 2013. Results on a problem involving the calibration of 22 parameters show that on the same computing resources used for parallel WRF-Hydro, the HPC-enabled parallel PEST can speed up the calibration process by a factor of up to 15 compared with commonly used PEST in sequential mode. The speedup factor is expected to be greater with a larger calibration problem (e.g., more parameters to be calibrated or a larger size of study area).

Download Full-text

Applying parallel programming and high performance computing to speed up data mining processing

2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS) ◽

10.1109/icis.2017.7960006 ◽

2017 ◽

Cited By ~ 2

Author(s):

Ruijian Zhang

Keyword(s):

Data Mining ◽

High Performance Computing ◽

Parallel Programming ◽

High Performance ◽

Speed Up ◽

Performance Computing

Download Full-text

parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants

GigaScience ◽

10.1093/gigascience/giaa052 ◽

2020 ◽

Vol 9 (5) ◽

Author(s):

Alessandro Petrini ◽

Marco Mesiti ◽

Max Schubach ◽

Marco Frasca ◽

Daniel Danis ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Genomic Medicine ◽

Genomic Data ◽

Coding Regions ◽

Pathogenic Variants ◽

Genome Wide ◽

Speed Up ◽

High Performance Computing Cluster ◽

Performance Computing

Abstract Background Several prediction problems in computational biology and genomic medicine are characterized by both big data as well as a high imbalance between examples to be learned, whereby positive examples can represent a tiny minority with respect to negative examples. For instance, deleterious or pathogenic variants are overwhelmed by the sea of neutral variants in the non-coding regions of the genome: thus, the prediction of deleterious variants is a challenging, highly imbalanced classification problem, and classical prediction tools fail to detect the rare pathogenic examples among the huge amount of neutral variants or undergo severe restrictions in managing big genomic data. Results To overcome these limitations we propose parSMURF, a method that adopts a hyper-ensemble approach and oversampling and undersampling techniques to deal with imbalanced data, and parallel computational techniques to both manage big genomic data and substantially speed up the computation. The synergy between Bayesian optimization techniques and the parallel nature of parSMURF enables efficient and user-friendly automatic tuning of the hyper-parameters of the algorithm, and allows specific learning problems in genomic medicine to be easily fit. Moreover, by using MPI parallel and machine learning ensemble techniques, parSMURF can manage big data by partitioning them across the nodes of a high-performance computing cluster. Results with synthetic data and with single-nucleotide variants associated with Mendelian diseases and with genome-wide association study hits in the non-coding regions of the human genome, involhing millions of examples, show that parSMURF achieves state-of-the-art results and an 80-fold speed-up with respect to the sequential version. Conclusions parSMURF is a parallel machine learning tool that can be trained to learn different genomic problems, and its multiple levels of parallelization and high scalability allow us to efficiently fit problems characterized by big and imbalanced genomic data. The C++ OpenMP multi-core version tailored to a single workstation and the C++ MPI/OpenMP hybrid multi-core and multi-node parSMURF version tailored to a High Performance Computing cluster are both available at https://github.com/AnacletoLAB/parSMURF.

Download Full-text

The influence of data size on a high-performance computing memetic algorithm in fingerprint dataset

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i4.2760 ◽

2021 ◽

Vol 10 (4) ◽

pp. 2110-2118

Author(s):

Priati Assiroj ◽

Harco Leslie Hendric Spits Warnars ◽

Edi Abdurachman ◽

Achmad Imam Kistijantoro ◽

Antoine Doucet

Keyword(s):

High Performance Computing ◽

Operating Systems ◽

High Performance ◽

Processing Time ◽

Memetic Algorithm ◽

Simple Linear Regression ◽

Unique Data ◽

Speed Up ◽

Performance Computing ◽

Data Variation

The fingerprint is one kind of biometric. This biometric unique data have to be processed well and secure. The problem gets more complicated as data grows. This work is conducted to process image fingerprint data with a memetic algorithm, a simple and reliable algorithm. In order to achieve the best result, we run this algorithm in a parallel environment by utilizing a multi-thread feature of the processor. We propose a high-performance computing memetic algorithm (HPCMA) to process a 7200 image fingerprint dataset which is divided into fifteen specimens based on its characteristics based on the image specification to get the detail of each image. A combination of each specimen generates a new data variation. This algorithm runs in two different operating systems, Windows 7 and Windows 10 then we measure the influence of data size on processing time, speed up, and efficiency of HPCMA with simple linear regression. The result shows data size is very influencing to processing time more than 90%, to speed up more than 30%, and to efficiency more than 19%.

Download Full-text

Application of High-Performance Computing for Calculating Reserves using the Cape Cod Method

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1032.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 54-57

Keyword(s):

High Performance Computing ◽

High Performance ◽

Cape Cod ◽

Insurance Company ◽

Significant Step ◽

Speed Up ◽

The Many ◽

Time Required ◽

Strategic Perspective ◽

Performance Computing

Calculation of reserves is one significant step in the strategic perspective for an insurance company. This is done on a periodic manner in order to have a better understanding regarding the future liabilities of the company. The time required for these calculations will grow quadratically as the size of the input increases. These computations are done several times by taking different factors in view. Thus, these computations become costly with regard to time. One of the many actuarial techniques for calculating reserves namely the Cape Cod method was used to calculate the reserves. HPC was applied for the same in order to reduce the time required to calculate a company’s reserves. We applied HPC to calculate Projected Ultimate Claims based on Cape Cod method. Method takes year wise input of reported claims, incurred claims and earned premiums and outputs Projected Ultimate claims for each year. Making use of the independence which exists in computation of output across years, we parallelized the computations across years. The parallelization strategy proposed gives the speed up to 66194X compared to the serial implementation of the same.

Download Full-text

parSMURF, a High Performance Computing tool for the genome-wide detection of pathogenic variants

10.1101/2020.03.18.994079 ◽

2020 ◽

Author(s):

Alessandro Petrini ◽

Marco Mesiti ◽

Max Schubach ◽

Marco Frasca ◽

Daniel Danis ◽

...

Keyword(s):

Machine Learning ◽

High Performance Computing ◽

High Performance ◽

Genomic Medicine ◽

Genomic Data ◽

Coding Regions ◽

Pathogenic Variants ◽

Speed Up ◽

High Performance Computing Cluster ◽

Performance Computing

AbstractSeveral prediction problems in Computational Biology and Genomic Medicine are characterized by both big data as well as a high imbalance between examples to be learned, whereby positive examples can represent a tiny minority with respect to negative examples. For instance, deleterious or pathogenic variants are overwhelmed by the sea of neutral variants in the non-coding regions of the genome: as a consequence the prediction of deleterious variants is a very challenging highly imbalanced classification problem, and classical prediction tools fail to detect the rare pathogenic examples among the huge amount of neutral variants or undergo severe restrictions in managing big genomic data. To overcome these limitations we propose parSMURF, a method that adopts a hyper-ensemble approach and oversampling and undersampling techniques to deal with imbalanced data, and parallel computational techniques to both manage big genomic data and significantly speed-up the computation. The synergy between Bayesian optimization techniques and the parallel nature of parSMURF enables efficient and user-friendly automatic tuning of the hyper-parameters of the algorithm, and allows specific learning problems in Genomic Medicine to be easily fit. Moreover, by using MPI parallel and machine learning ensemble techniques, parSMURF can manage big data by partitioning them across the nodes of a High Performance Computing cluster.Results with synthetic data and with single nucleotide variants associated with Mendelian diseases and with GWAS hits in the non-coding regions of the human genome, involving millions of examples, show that parSMURF achieves state-of-the-art results and a speed-up of 80× with respect to the sequential version.In conclusion parSMURF is a parallel machine learning tool that can be trained to learn different genomic problems, and its multiple levels of parallelization and its high scalability allow us to efficiently fit problems characterized by big and imbalanced genomic data.Availability and ImplementationThe C++ OpenMP multi-core version tailored to a single workstation and the C++ MPI/OpenMP hybrid multi-core and multi-node parSMURF version tailored to a High Performance Computing cluster are both available from github: https://github.com/AnacletoLAB/parSMURF

Download Full-text