Run-Time and Compiler Support for Programming in Adaptive Parallel Environments

Graph component labeling, which is a subset of the general graph coloring problem, is a computationally expensive operation in many important applications and simulations. A number of data-parallel algorithmic variations to the component labeling problem are possible and we explore their use with general purpose graphical processing units (GPGPUs) and with the CUDA GPU programming language. We discuss implementation issues and performance results on CPUs and GPUs using CUDA. We evaluated our system with real-world graphs. We show how to consider different architectural features of the GPU and the host CPUs and achieve high performance.

Download Full-text

A SCALABLE DISTRIBUTED MULTIMEDIA KNOWLEDGE RETRIEVAL SYSTEM ON A CLUSTER OF HETEROGENEOUS HIGH PERFORMANCE ARCHITECTURES

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213000000227 ◽

2000 ◽

Vol 09 (03) ◽

pp. 343-367

Author(s):

STEPHEN W. RYAN ◽

ARVIND K. BANSAL

Keyword(s):

Logic Programming ◽

Message Passing ◽

High Performance ◽

Programming Model ◽

Object Oriented Programming ◽

Remote Access ◽

The Internet ◽

Data Parallel ◽

Heterogeneous Architectures ◽

Performance Results

This paper describes a system to distribute and retrieve multimedia knowledge on a cluster of heterogeneous high performance architectures distributed over the Internet. The knowledge is represented using facts and rules in an associative logic-programming model. Associative computation facilitates distribution of facts and rules, and exploits coarse grain data parallel computation. Associative logic programming uses a flat data model that can be easily mapped onto heterogeneous architectures. The paper describes an abstract instruction set for the distributed version of the associative logic programming and the corresponding implementation. The implementation uses a message-passing library for architecture independence within a cluster, uses object oriented programming for modularity and portability, and uses Java as a front-end interface to provide a graphical user interface and multimedia capability and remote access via the Internet. The performance results on a cluster of IBM RS 6000 workstations are presented. The results show that distribution of data improves the performance almost linearly for small number of processors in a cluster.

Download Full-text

A FRAMEWORK FOR HETEROGENEOUS ASSOCIATIVE LOGIC PROGRAMMING

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213095000036 ◽

1995 ◽

Vol 04 (01n02) ◽

pp. 33-53 ◽

Cited By ~ 2

Author(s):

ARVIND K. BANSAL

Keyword(s):

Logic Programming ◽

High Performance ◽

Heterogeneous Computing ◽

Data Transfer ◽

Instruction Set ◽

Data Parallel ◽

Data Alignment ◽

Resolution Scheme ◽

Data Elements ◽

Performance Results

Associative computation is characterized by seamless intertwining of search-by-content and data parallel computation. The search-by-content paradigm is natural to scalable high performance heterogeneous computing since the use of tagged data avoids the need for explicit addressing mechanisms. In this paper, the author presents an algebra for associative logic programming, an associative resolution scheme, and a generic framework of an associative abstract instruction set. The model is based on the integration of data alignment and the use of two types of bags: data element bags and filter bags of Boolean values to select and restrict computation on data elements. The use of filter bags integrated with data alignment reduces computation and data transfer overhead, and the use of tagged data reduces overhead of preparing data before data transmission. The abstract instruction set has been illustrated by an example. Performance results are presented for a simulation in a homogeneous address space.

Download Full-text

DATA DISTRIBUTION ANALYSIS FOR IRREGULAR AND ADAPTIVE CODES

Parallel Processing Letters ◽

10.1142/s0129626499000153 ◽

1999 ◽

Vol 09 (01) ◽

pp. 135-146

Author(s):

GAGAN AGRAWAL

Keyword(s):

Data Partitioning ◽

Cost Functions ◽

Distribution Analysis ◽

Time Analysis ◽

Analysis Techniques ◽

Data Parallel ◽

Distributed Memory Machines ◽

Single Data ◽

The Cost ◽

Performance Results

An important component in compiling for distributed memory machines is data partitioning. While a number of automatic analysis techniques have been proposed for this phase, none of them is applicable for irregular problems. In this paper, we present compile-time analysis for determining data partitioning for such applications. We have developed a set of cost functions for determining communication and redistribution costs in irregular codes. We first determine the appropriate distributions for a single data parallel statement, and then use the cost functions with a greedy algorithm for computing distributions for the full program. Initial performance results on a 16 processor IBM SP-2 are also presented.

Download Full-text

Parallel and Distributed Data Mining through Parallel Skeletons and Distributed Objects

Data Mining ◽

10.4018/978-1-59140-051-6.ch005 ◽

2011 ◽

pp. 106-141

Author(s):

Massimo Coppola ◽

Marco Vanneschi

Keyword(s):

Data Mining ◽

High Performance ◽

Distributed Databases ◽

Sequential Algorithm ◽

Distributed Data ◽

Programming Environment ◽

Programming Environments ◽

Geographically Distributed ◽

High Level ◽

Performance Results

We consider the application of parallel programming environments to develop portable and efficient high performance data mining (DM) tools. We first assess the need of parallel and distributed DM applications, by pointing out the problems of scalability of some mining techniques and the need to mine large, eventually geographically distributed databases. We discuss the main issues of exploiting parallel and distributed computation for DM algorithms. A high-level programming language enhances the software engineering aspects of parallel DM, and it simplifies the problems of integration with existing sequential and parallel data management systems, thus leading to programming-efficient and high-performance implementations of applications. We describe a programming environment we have implemented that is based on the parallel skeleton model, and we examine the addition of object-like interfaces toward external libraries and system software layers. This kind of abstractions will be included in the forthcoming programming environment ASSIST. In the main part of the chapter, as a proof-of-concept we describe three well-known DM algorithms, Apriori, C4.5, and DBSCAN. For each problem, we explain the sequential algorithm and a structured parallel version, which is discussed and compared to parallel solutions found in the literature. We also discuss the potential gain in performance and expressiveness from the addition of external objects on the basis of the experiments we performed so far. We evaluate the approach with respect to performance results, design, and implementation considerations.

Download Full-text

Data parallel programming in an adaptive environment

Proceedings of 9th International Parallel Processing Symposium ◽

10.1109/ipps.1995.395855 ◽

2002 ◽

Cited By ~ 13

Author(s):

G. Edjlali ◽

G. Agrawal ◽

A. Sussman ◽

J. Saltz

Keyword(s):

Parallel Programming ◽

Data Parallel ◽

Adaptive Environment ◽

Data Parallel Programming

Download Full-text

Applying a Hybrid Sequential Model to Chinese Sentence Correction

Symmetry ◽

10.3390/sym12121939 ◽

2020 ◽

Vol 12 (12) ◽

pp. 1939

Author(s):

Jun Wei Chen ◽

Xanno K. Sigalingging ◽

Jenq-Shiou Leu ◽

Jun-Ichi Takada

Keyword(s):

Neural Network ◽

Deep Learning ◽

Language Learning ◽

High Performance ◽

Large Scale ◽

Computation Time ◽

Semantic Error ◽

Commercial Applications ◽

Time Required ◽

The Cost

In recent years, Chinese has become one of the most popular languages globally. The demand for automatic Chinese sentence correction has gradually increased. This research can be adopted to Chinese language learning to reduce the cost of learning and feedback time, and help writers check for wrong words. The traditional way to do Chinese sentence correction is to check if the word exists in the predefined dictionary. However, this kind of method cannot deal with semantic error. As deep learning becomes popular, an artificial neural network can be applied to understand the sentence’s context to correct the semantic error. However, there are still many issues that need to be discussed. For example, the accuracy and the computation time required to correct a sentence are still lacking, so maybe it is still not the time to adopt the deep learning based Chinese sentence correction system to large-scale commercial applications. Our goal is to obtain a model with better accuracy and computation time. Combining recurrent neural network and Bidirectional Encoder Representations from Transformers (BERT), a recently popular model, known for its high performance and slow inference speed, we introduce a hybrid model which can be applied to Chinese sentence correction, improving the accuracy and also the inference speed. Among the results, BERT-GRU has obtained the highest BLEU Score in all experiments. The inference speed of the transformer-based original model can be improved by 1131% in beam search decoding in the 128-word experiment, and greedy decoding can also be improved by 452%. The longer the sequence, the larger the improvement.

Download Full-text

High Performance Materials Development in the 21st Century: Trends and Directions

Materials Science Forum ◽

10.4028/www.scientific.net/msf.449-452.7 ◽

2004 ◽

Vol 449-452 ◽

pp. 7-12

Author(s):

James C. Williams

Keyword(s):

Material Properties ◽

Gas Turbines ◽

High Performance ◽

Matrix Composites ◽

New Materials ◽

Materials Development ◽

Recent Developments ◽

New Material ◽

Time Required ◽

The Cost

Product performance including the cost of ownership is becoming increasingly dependent on the availability of high quality, high performance, affordable materials of construction. Today, the requirements placed on a new material for a high performance structural application extend well beyond the improvement of one or more material properties. This makes the introduction of a new material a multi-faceted activity. Modern structural materials derive their performance from a combination of composition and processing, the results of which are inextricably intertwined. This statement pertains to both metallic alloys and to fiber reinforced composite materials. In addition, material cost and the reproducibility of material properties are becoming more central as acceptance criteria for incorporating new materials into new products. This paper will use examples of recent developments in materials for aircraft gas turbines to depict the materials introduction process. Some of these developments have been successful and others have not. These examples illustrate the changing picture that represents the successful introduction of a new structural material, even in a high performance, high value product such as a gas turbine. Specific examples will include metal matrix composites, Ni-base alloys and improved reliability Ti alloys. The basis for successful introduction, or lack thereof will be discussed. While the examples are specific to gas turbines, they are generally instructive and depict the growing complexity of the process of developing and introducing new materials into a high value product. An additional issue for all new materials introduction is the time required to achieve product readiness. As the time required for product design decreases, there has been little commensurate reduction in materials development cycle time. This matter also will be discussed and some possible reasons and potential solutions will be described.

Download Full-text

Self-Restoration Mechanism for Run-Time Reconfigurable Data-Stream Processors

10.32920/ryerson.14654814 ◽

2021 ◽

Author(s):

Irina Terterian

Keyword(s):

Reconfigurable Computing ◽

High Performance ◽

Processing System ◽

Maximum Level ◽

Computing Systems ◽

Computing Platform ◽

Restoration Mechanism ◽

Run Time ◽

Computing Platforms ◽

The Cost

The cost of a hardward failure in high-performance computing systems is usually extremely high because of the system stall where billions of operations can be lost within one second. Thus, implementation of self-restoration mechanisms is one of the most effective approaches to keep system performance on a required level. The project presents a new approach, which allows retaining the performance of the Run-Time Reconfigurable stream processing system on its maximum level. This becomes possible by development of multi-level self-restoration mechanism that consists of: restoration by FPGA-scrubbing, restoration by FPGA-slot replacement and restoration with optimum performance degradation. All above levels of restoration procedure were developed and tested on reconfigurable computing platform based on XILINX Virtex FPGA. Analysis of achieved results of the developed mechanism shows a very fast restoration of functionality and dramatic increase of lifetime of FPGA based computing platforms.

Download Full-text

Self-Restoration Mechanism for Run-Time Reconfigurable Data-Stream Processors

10.32920/ryerson.14654814.v1 ◽

2021 ◽

Author(s):

Irina Terterian

Keyword(s):

Reconfigurable Computing ◽

High Performance ◽

Processing System ◽

Maximum Level ◽

Computing Systems ◽

Computing Platform ◽

Restoration Mechanism ◽

Run Time ◽

Computing Platforms ◽

The Cost

The cost of a hardward failure in high-performance computing systems is usually extremely high because of the system stall where billions of operations can be lost within one second. Thus, implementation of self-restoration mechanisms is one of the most effective approaches to keep system performance on a required level. The project presents a new approach, which allows retaining the performance of the Run-Time Reconfigurable stream processing system on its maximum level. This becomes possible by development of multi-level self-restoration mechanism that consists of: restoration by FPGA-scrubbing, restoration by FPGA-slot replacement and restoration with optimum performance degradation. All above levels of restoration procedure were developed and tested on reconfigurable computing platform based on XILINX Virtex FPGA. Analysis of achieved results of the developed mechanism shows a very fast restoration of functionality and dramatic increase of lifetime of FPGA based computing platforms.

Download Full-text