scholarly journals Run-Time and Compiler Support for Programming in Adaptive Parallel Environments

1997 ◽  
Vol 6 (2) ◽  
pp. 215-227 ◽  
Author(s):  
Guy Edjlali ◽  
Gagan Guyagrawal ◽  
Alan Sussman ◽  
Jim Humphries ◽  
Joel Saltz

For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at run-time. In this article, we discuss run-time support for data-parallel programming in such an adaptive environment. Executing programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a run-time library to provide this support. We discuss how the run-time library can be used by compilers of high-performance Fortran (HPF)-like languages to generate code for an adaptive environment. We present performance results for a Navier-Stokes solver and a multigrid template run on a network of workstations and an IBM SP-2. Our experiments show that if the number of processors is not varied frequently, the cost of data redistribution is not significant compared to the time required for the actual computation. Overall, our work establishes the feasibility of compiling HPF for a network of nondedicated workstations, which are likely to be an important resource for parallel programming in the future.

2014 ◽  
Vol 596 ◽  
pp. 276-279
Author(s):  
Xiao Hui Pan

Graph component labeling, which is a subset of the general graph coloring problem, is a computationally expensive operation in many important applications and simulations. A number of data-parallel algorithmic variations to the component labeling problem are possible and we explore their use with general purpose graphical processing units (GPGPUs) and with the CUDA GPU programming language. We discuss implementation issues and performance results on CPUs and GPUs using CUDA. We evaluated our system with real-world graphs. We show how to consider different architectural features of the GPU and the host CPUs and achieve high performance.


2000 ◽  
Vol 09 (03) ◽  
pp. 343-367
Author(s):  
STEPHEN W. RYAN ◽  
ARVIND K. BANSAL

This paper describes a system to distribute and retrieve multimedia knowledge on a cluster of heterogeneous high performance architectures distributed over the Internet. The knowledge is represented using facts and rules in an associative logic-programming model. Associative computation facilitates distribution of facts and rules, and exploits coarse grain data parallel computation. Associative logic programming uses a flat data model that can be easily mapped onto heterogeneous architectures. The paper describes an abstract instruction set for the distributed version of the associative logic programming and the corresponding implementation. The implementation uses a message-passing library for architecture independence within a cluster, uses object oriented programming for modularity and portability, and uses Java as a front-end interface to provide a graphical user interface and multimedia capability and remote access via the Internet. The performance results on a cluster of IBM RS 6000 workstations are presented. The results show that distribution of data improves the performance almost linearly for small number of processors in a cluster.


1995 ◽  
Vol 04 (01n02) ◽  
pp. 33-53 ◽  
Author(s):  
ARVIND K. BANSAL

Associative computation is characterized by seamless intertwining of search-by-content and data parallel computation. The search-by-content paradigm is natural to scalable high performance heterogeneous computing since the use of tagged data avoids the need for explicit addressing mechanisms. In this paper, the author presents an algebra for associative logic programming, an associative resolution scheme, and a generic framework of an associative abstract instruction set. The model is based on the integration of data alignment and the use of two types of bags: data element bags and filter bags of Boolean values to select and restrict computation on data elements. The use of filter bags integrated with data alignment reduces computation and data transfer overhead, and the use of tagged data reduces overhead of preparing data before data transmission. The abstract instruction set has been illustrated by an example. Performance results are presented for a simulation in a homogeneous address space.


1999 ◽  
Vol 09 (01) ◽  
pp. 135-146
Author(s):  
GAGAN AGRAWAL

An important component in compiling for distributed memory machines is data partitioning. While a number of automatic analysis techniques have been proposed for this phase, none of them is applicable for irregular problems. In this paper, we present compile-time analysis for determining data partitioning for such applications. We have developed a set of cost functions for determining communication and redistribution costs in irregular codes. We first determine the appropriate distributions for a single data parallel statement, and then use the cost functions with a greedy algorithm for computing distributions for the full program. Initial performance results on a 16 processor IBM SP-2 are also presented.


Data Mining ◽  
2011 ◽  
pp. 106-141
Author(s):  
Massimo Coppola ◽  
Marco Vanneschi

We consider the application of parallel programming environments to develop portable and efficient high performance data mining (DM) tools. We first assess the need of parallel and distributed DM applications, by pointing out the problems of scalability of some mining techniques and the need to mine large, eventually geographically distributed databases. We discuss the main issues of exploiting parallel and distributed computation for DM algorithms. A high-level programming language enhances the software engineering aspects of parallel DM, and it simplifies the problems of integration with existing sequential and parallel data management systems, thus leading to programming-efficient and high-performance implementations of applications. We describe a programming environment we have implemented that is based on the parallel skeleton model, and we examine the addition of object-like interfaces toward external libraries and system software layers. This kind of abstractions will be included in the forthcoming programming environment ASSIST. In the main part of the chapter, as a proof-of-concept we describe three well-known DM algorithms, Apriori, C4.5, and DBSCAN. For each problem, we explain the sequential algorithm and a structured parallel version, which is discussed and compared to parallel solutions found in the literature. We also discuss the potential gain in performance and expressiveness from the addition of external objects on the basis of the experiments we performed so far. We evaluate the approach with respect to performance results, design, and implementation considerations.


Symmetry ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 1939
Author(s):  
Jun Wei Chen ◽  
Xanno K. Sigalingging ◽  
Jenq-Shiou Leu ◽  
Jun-Ichi Takada

In recent years, Chinese has become one of the most popular languages globally. The demand for automatic Chinese sentence correction has gradually increased. This research can be adopted to Chinese language learning to reduce the cost of learning and feedback time, and help writers check for wrong words. The traditional way to do Chinese sentence correction is to check if the word exists in the predefined dictionary. However, this kind of method cannot deal with semantic error. As deep learning becomes popular, an artificial neural network can be applied to understand the sentence’s context to correct the semantic error. However, there are still many issues that need to be discussed. For example, the accuracy and the computation time required to correct a sentence are still lacking, so maybe it is still not the time to adopt the deep learning based Chinese sentence correction system to large-scale commercial applications. Our goal is to obtain a model with better accuracy and computation time. Combining recurrent neural network and Bidirectional Encoder Representations from Transformers (BERT), a recently popular model, known for its high performance and slow inference speed, we introduce a hybrid model which can be applied to Chinese sentence correction, improving the accuracy and also the inference speed. Among the results, BERT-GRU has obtained the highest BLEU Score in all experiments. The inference speed of the transformer-based original model can be improved by 1131% in beam search decoding in the 128-word experiment, and greedy decoding can also be improved by 452%. The longer the sequence, the larger the improvement.


2004 ◽  
Vol 449-452 ◽  
pp. 7-12
Author(s):  
James C. Williams

Product performance including the cost of ownership is becoming increasingly dependent on the availability of high quality, high performance, affordable materials of construction. Today, the requirements placed on a new material for a high performance structural application extend well beyond the improvement of one or more material properties. This makes the introduction of a new material a multi-faceted activity. Modern structural materials derive their performance from a combination of composition and processing, the results of which are inextricably intertwined. This statement pertains to both metallic alloys and to fiber reinforced composite materials. In addition, material cost and the reproducibility of material properties are becoming more central as acceptance criteria for incorporating new materials into new products. This paper will use examples of recent developments in materials for aircraft gas turbines to depict the materials introduction process. Some of these developments have been successful and others have not. These examples illustrate the changing picture that represents the successful introduction of a new structural material, even in a high performance, high value product such as a gas turbine. Specific examples will include metal matrix composites, Ni-base alloys and improved reliability Ti alloys. The basis for successful introduction, or lack thereof will be discussed. While the examples are specific to gas turbines, they are generally instructive and depict the growing complexity of the process of developing and introducing new materials into a high value product. An additional issue for all new materials introduction is the time required to achieve product readiness. As the time required for product design decreases, there has been little commensurate reduction in materials development cycle time. This matter also will be discussed and some possible reasons and potential solutions will be described.


2021 ◽  
Author(s):  
Irina Terterian

The cost of a hardward failure in high-performance computing systems is usually extremely high because of the system stall where billions of operations can be lost within one second. Thus, implementation of self-restoration mechanisms is one of the most effective approaches to keep system performance on a required level. The project presents a new approach, which allows retaining the performance of the Run-Time Reconfigurable stream processing system on its maximum level. This becomes possible by development of multi-level self-restoration mechanism that consists of: restoration by FPGA-scrubbing, restoration by FPGA-slot replacement and restoration with optimum performance degradation. All above levels of restoration procedure were developed and tested on reconfigurable computing platform based on XILINX Virtex FPGA. Analysis of achieved results of the developed mechanism shows a very fast restoration of functionality and dramatic increase of lifetime of FPGA based computing platforms.


2021 ◽  
Author(s):  
Irina Terterian

The cost of a hardward failure in high-performance computing systems is usually extremely high because of the system stall where billions of operations can be lost within one second. Thus, implementation of self-restoration mechanisms is one of the most effective approaches to keep system performance on a required level. The project presents a new approach, which allows retaining the performance of the Run-Time Reconfigurable stream processing system on its maximum level. This becomes possible by development of multi-level self-restoration mechanism that consists of: restoration by FPGA-scrubbing, restoration by FPGA-slot replacement and restoration with optimum performance degradation. All above levels of restoration procedure were developed and tested on reconfigurable computing platform based on XILINX Virtex FPGA. Analysis of achieved results of the developed mechanism shows a very fast restoration of functionality and dramatic increase of lifetime of FPGA based computing platforms.


Sign in / Sign up

Export Citation Format

Share Document