Integration, Development and Performance of the 500 TFLOPS Heterogeneous Cluster (Condor)

Author(s):  
Mark Barnell ◽  
Qing Wu ◽  
Richard Linderman

The Air Force Research Laboratory Information Directorate Advanced Computing Division (AFRL/RIT) High Performance Computing Affiliated Resource Center (HPC-ARC) is the host to a very large scale interactive computing cluster consisting of about 1800 nodes. Condor, the largest interactive Cell cluster in the world, consists of integrated heterogeneous processors of IBM Cell Broadband Engine (Cell BE) multicore CPUs, NVIDIA General Purpose Graphic Processing Units (GPGPUs) and Intel x86 server nodes in a 10Gb Ethernet Star Hub network and 20Gb/s Infiniband Mesh, with a combined capability of 500 trillion floating operations per second (TFLOPS). Applications developed and running on CONDOR include large-scale computational intelligence models, video synthetic aperture radar (SAR) back-projection, Space Situational Awareness (SSA), video target tracking, linear algebra and others. This presentation will discuss the design and integration of the system. It will also show progress on performance optimization efforts and lessons learned on algorithm scalability on a heterogeneous architecture.

2017 ◽  
Vol 20 (4) ◽  
pp. 1151-1159 ◽  
Author(s):  
Folker Meyer ◽  
Saurabh Bagchi ◽  
Somali Chaterji ◽  
Wolfgang Gerlach ◽  
Ananth Grama ◽  
...  

Abstract As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1–3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community’s data analysis tasks.


2014 ◽  
Vol 596 ◽  
pp. 276-279
Author(s):  
Xiao Hui Pan

Graph component labeling, which is a subset of the general graph coloring problem, is a computationally expensive operation in many important applications and simulations. A number of data-parallel algorithmic variations to the component labeling problem are possible and we explore their use with general purpose graphical processing units (GPGPUs) and with the CUDA GPU programming language. We discuss implementation issues and performance results on CPUs and GPUs using CUDA. We evaluated our system with real-world graphs. We show how to consider different architectural features of the GPU and the host CPUs and achieve high performance.


Author(s):  
Masaki Iwasawa ◽  
Daisuke Namekata ◽  
Keigo Nitadori ◽  
Kentaro Nomura ◽  
Long Wang ◽  
...  

Abstract We describe algorithms implemented in FDPS (Framework for Developing Particle Simulators) to make efficient use of accelerator hardware such as GPGPUs (general-purpose computing on graphics processing units). We have developed FDPS to make it possible for researchers to develop their own high-performance parallel particle-based simulation programs without spending large amounts of time on parallelization and performance tuning. FDPS provides a high-performance implementation of parallel algorithms for particle-based simulations in a “generic” form, so that researchers can define their own particle data structure and interparticle interaction functions. FDPS compiled with user-supplied data types and interaction functions provides all the necessary functions for parallelization, and researchers can thus write their programs as though they are writing simple non-parallel code. It has previously been possible to use accelerators with FDPS by writing an interaction function that uses the accelerator. However, the efficiency was limited by the latency and bandwidth of communication between the CPU and the accelerator, and also by the mismatch between the available degree of parallelism of the interaction function and that of the hardware parallelism. We have modified the interface of the user-provided interaction functions so that accelerators are more efficiently used. We also implemented new techniques which reduce the amount of work on the CPU side and the amount of communication between CPU and accelerators. We have measured the performance of N-body simulations on a system with an NVIDIA Volta GPGPU using FDPS and the achieved performance is around 27% of the theoretical peak limit. We have constructed a detailed performance model, and found that the current implementation can achieve good performance on systems with much smaller memory and communication bandwidth. Thus, our implementation will be applicable to future generations of accelerator system.


2020 ◽  
Vol 496 (1) ◽  
pp. 629-637
Author(s):  
Ce Yu ◽  
Kun Li ◽  
Shanjiang Tang ◽  
Chao Sun ◽  
Bin Ma ◽  
...  

ABSTRACT Time series data of celestial objects are commonly used to study valuable and unexpected objects such as extrasolar planets and supernova in time domain astronomy. Due to the rapid growth of data volume, traditional manual methods are becoming extremely hard and infeasible for continuously analysing accumulated observation data. To meet such demands, we designed and implemented a special tool named AstroCatR that can efficiently and flexibly reconstruct time series data from large-scale astronomical catalogues. AstroCatR can load original catalogue data from Flexible Image Transport System (FITS) files or data bases, match each item to determine which object it belongs to, and finally produce time series data sets. To support the high-performance parallel processing of large-scale data sets, AstroCatR uses the extract-transform-load (ETL) pre-processing module to create sky zone files and balance the workload. The matching module uses the overlapped indexing method and an in-memory reference table to improve accuracy and performance. The output of AstroCatR can be stored in CSV files or be transformed other into formats as needed. Simultaneously, the module-based software architecture ensures the flexibility and scalability of AstroCatR. We evaluated AstroCatR with actual observation data from The three Antarctic Survey Telescopes (AST3). The experiments demonstrate that AstroCatR can efficiently and flexibly reconstruct all time series data by setting relevant parameters and configuration files. Furthermore, the tool is approximately 3× faster than methods using relational data base management systems at matching massive catalogues.


2021 ◽  
Author(s):  
Allen Yen-Cheng Yu

Many large-scale online applications enable thousands of users to access their services simultaneously. However, the overall service quality of an online application usually degrades when the number of users increases because, traditionally, centralized server architecture does not scale well. In order to provide better Quality of Service (QoS), service architecture such as Grid computing can be used. This type of architecture offers service scalability by utilizing heterogeneous hardware resources. In this thesis, a novel design of Grid computing middleware, Massively Multi-user Online Platform (MMOP), which integrates the Peer-to-Peer (P2P) structured overlays, is proposed. The objectives of this proposed design are to offer scalability and system design flexibility, simplify development processes of distributed applications, and improve QoS by following specified policy rules. A Massively Multiplayer Online Game (MMOG) has been created to validate the functionality and performance of MMOP. The simulation results have demonstrated that MMOP is a high performance and scalable servicing and computing middleware.


2021 ◽  
Author(s):  
Allen Yen-Cheng Yu

Many large-scale online applications enable thousands of users to access their services simultaneously. However, the overall service quality of an online application usually degrades when the number of users increases because, traditionally, centralized server architecture does not scale well. In order to provide better Quality of Service (QoS), service architecture such as Grid computing can be used. This type of architecture offers service scalability by utilizing heterogeneous hardware resources. In this thesis, a novel design of Grid computing middleware, Massively Multi-user Online Platform (MMOP), which integrates the Peer-to-Peer (P2P) structured overlays, is proposed. The objectives of this proposed design are to offer scalability and system design flexibility, simplify development processes of distributed applications, and improve QoS by following specified policy rules. A Massively Multiplayer Online Game (MMOG) has been created to validate the functionality and performance of MMOP. The simulation results have demonstrated that MMOP is a high performance and scalable servicing and computing middleware.


2014 ◽  
Vol 5 (1) ◽  
pp. 52
Author(s):  
Waldir Vilalva Dezan

The benefits gained in design mediated by Building Information Modelling (BIM) technology are manifold, among them stand out the early visualization, the generation of accurate 2D drawings, collaboration, verification of design intent, the extraction of cost estimates and performance evaluations. By adopting this modeling technology and using to produce, communicate and analyze architectural or engineering solutions practice is transformed. Therefore, the implementation of this new method of working in architectural design and engineering firms finds resistance, implies in adoption stages where incremental adjustments must occur to overcome difficulties and ensure learning and gaining with the new process. The Architectural and Engineering Office COORDENADORIA DE PROJETOS (CPROJ ), belonging to the School of Civil and Architecture and Urban Planning of the University of Campinas, seeks continually innovations therefore incorporated BIM in its design method. This paper presents a practical case, that is, the first large scale project developed with BIM, considered to be a BIM pilot study at CPROJ. The pilot study was the research laboratory of the Center of Molecular and Cellular Engineering of the Boldrini Children’s Hospital. Training efforts and ownership of BIM previous to the pilot study and the pilot study itself are presented. The highlights and lessons learned in this process are summarized. The understanding of how BIM changed the office production and qualitatively benefits achieved are presented.


2018 ◽  
Vol 66 (4) ◽  

The restorative qualities of sleep are fundamentally the basis of the individual athlete’s ability to recover and perform, and to optimally be able to challenge and control the effects of exercise regimes in high performance sport. Research consistently shows that a large percentage of the population fails to obtain the recommended 7–9 hours of sleep per night [17]. Moreover, recent years’ research has found that athletes have a high prevalence of poor sleep quality [6]. Given its implications on the recovery process, sleep affects the quality of the athlete’s training and outcome of competitions. Although an increasing number of recovery aids (such as cold baths, anti-inflammatory agents, high protein intake etc.) are available, recent years research show the important and irreplaceable role of sleep and that no recovery method can compensate for the lack of sleep. Every facet of an athlete’s life has the capacity to either create or take out energy, contribute to the overall stress level and subsequently the level of both recovery and performance. While traditional approaches to performance optimization focus simply on the physical stressors, this overview will highlight the benefits and the basic principles of sleep, its relation to recovery and performance, and provide input and reflect on what to consider when working with development and maintenance of athletic performance.


2021 ◽  
Author(s):  
Murtadha Al-Habib ◽  
Yasser Al-Ghamdi

Abstract Extensive computing resources are required to leverage todays advanced geoscience workflows that are used to explore and characterize giant petroleum resources. In these cases, high-performance workstations are often unable to adequately handle the scale of computing required. The workflows typically utilize complex and massive data sets, which require advanced computing resources to store, process, manage, and visualize various forms of the data throughout the various lifecycles. This work describes a large-scale geoscience end-to-end interpretation platform customized to run on a cluster-based remote visualization environment. A team of computing infrastructure and geoscience workflow experts was established to collaborate on the deployment, which was broken down into separate phases. Initially, an evaluation and analysis phase was conducted to analyze computing requirements and assess potential solutions. A testing environment was then designed, implemented and benchmarked. The third phase used the test environment to determine the scale of infrastructure required for the production environment. Finally, the full-scale customized production environment was deployed for end users. During testing phase, aspects such as connectivity, stability, interactivity, functionality, and performance were investigated using the largest available geoscience datasets. Multiple computing configurations were benchmarked until optimal performance was achieved, under applicable corporate information security guidelines. It was observed that the customized production environment was able to execute workflows that were unable to run on local user workstations. For example, while conducting connectivity, stability and interactivity benchmarking, the test environment was operated for extended periods to ensure stability for workflows that require multiple days to run. To estimate the scale of the required production environment, varying categories of users’ portfolio were determined based on data type, scale and workflow. Continuous monitoring of system resources and utilization enabled continuous improvements to the final solution. The utilization of a fit-for-purpose, customized remote visualization solution may reduce or ultimately eliminate the need to deploy high-end workstations to all end users. Rather, a shared, scalable and reliable cluster-based solution can serve a much larger user community in a highly performant manner.


Sign in / Sign up

Export Citation Format

Share Document