scholarly journals Irregular Computations in Fortran – Expression and Implementation Strategies

1999 ◽  
Vol 7 (3-4) ◽  
pp. 313-326 ◽  
Author(s):  
Jan F. Prins ◽  
Siddhartha Chatterjee ◽  
Martin Simons

Modern dialects of Fortran enjoy wide use and good support on high‐performance computers as performance‐oriented programming languages. By providing the ability to express nested data parallelism, modern Fortran dialects enable irregular computations to be incorporated into existing applications with minimal rewriting and without sacrificing performance within the regular portions of the application. Since performance of nested data‐parallel computation is unpredictable and often poor using current compilers, we investigatethreadingandflattening, two source‐to‐source transformation techniques that can improve performance and performance stability. For experimental validation of these techniques, we explore nested data‐parallel implementations of the sparse matrix‐vector product and the Barnes–Hut n‐body algorithm by hand‐coding thread‐based (using OpenMP directives) and flattening‐based versions of these algorithms and evaluating their performance on an SGI Origin 2000 and an NEC SX‐4, two shared‐memory machines.

2014 ◽  
Vol 596 ◽  
pp. 276-279
Author(s):  
Xiao Hui Pan

Graph component labeling, which is a subset of the general graph coloring problem, is a computationally expensive operation in many important applications and simulations. A number of data-parallel algorithmic variations to the component labeling problem are possible and we explore their use with general purpose graphical processing units (GPGPUs) and with the CUDA GPU programming language. We discuss implementation issues and performance results on CPUs and GPUs using CUDA. We evaluated our system with real-world graphs. We show how to consider different architectural features of the GPU and the host CPUs and achieve high performance.


1994 ◽  
Vol 03 (01) ◽  
pp. 97-125 ◽  
Author(s):  
ARVIND K. BANSAL

Associative Computation is characterized by intertwining of search by content and data parallel computation. An algebra for associative computation is described. A compilation based model and a novel abstract machine for associative logic programming are presented. The model uses loose coupling of left hand side of the program, treated as data, and right hand side of the program, treated as low level code. This representation achieves efficiency by associative computation and data alignment during goal reduction and during execution of low level abstract instructions. Data alignment reduces the overhead of data movement. Novel schemes for associative manipulation of aliased uninstantiated variables, data parallel goal reduction in the presence multiple occurrences of the same variables in a goal. The architecture, behavior, and performance evaluation of the model are presented.


1993 ◽  
Vol 04 (01) ◽  
pp. 65-83 ◽  
Author(s):  
SERGE PETITON ◽  
YOUCEF SAAD ◽  
KESHENG WU ◽  
WILLIAM FERNG

This paper presents a preliminary experimental study of the performance of basic sparse matrix computations on the CM-5. We concentrate on examining various ways of performing general sparse matrix-vector operations and the basic primitives on which these are based. We compare various data structures for storing sparse matrices and their corresponding matrix — vector operations. Both SPMD and Data parallel modes are examined and a comparison of the two modes is made.


2013 ◽  
Vol 10 (17) ◽  
pp. 20130529-20130529 ◽  
Author(s):  
Dan Zou ◽  
Yong Dou ◽  
Song Guo ◽  
Shice Ni

Author(s):  
Hartwig Anzt ◽  
Moritz Kreutzer ◽  
Eduardo Ponce ◽  
Gregory D Peterson ◽  
Gerhard Wellein ◽  
...  

In this paper, we present an optimized GPU implementation for the induced dimension reduction algorithm. We improve data locality, combine it with an efficient sparse matrix vector kernel, and investigate the potential of overlapping computation with communication as well as the possibility of concurrent kernel execution. A comprehensive performance evaluation is conducted using a suitable performance model. The analysis reveals efficiency of up to 90%, which indicates that the implementation achieves performance close to the theoretically attainable bound.


Author(s):  
Emil M. Oanta ◽  
Bogdan Nicolescu

Abstract In the paper we present a general solution for handling the large matrices. This solution is general because of the wide use of the matrix based approach in problems concerning numerical methods, experimental mechanics, computational mechanics, CFD, computer aided design, economical problems, etc. Some of the major advantages of this solution are: 1. lack of requirements regarding the use of some high-performance computers, the constraints being connected to the size of the hard-disk (at present increasing and being cheaper); 2. Windows operating system may be used but it is not absolutely necessary; 3. it represents an interface between the programming languages; 4. it can easily be used for development of multi-language software applications; 5. applicability in all the domains which use, at the logical level, ‘matrices’ (mathematics, engineering, economics); 6. there are no constraints regarding the use of ‘classic’ solutions techniques; 7. it is easy to implement in software applications already written; 8. data type used as interface may easily be modified in order to be adapted in an optimum way to the current application to be developed. By the use of this solution we solved a series of computer problems and ‘dedicated’ applications in some areas like: Mathematics, Experimental Mechanics, Computational Mechanics, Physics, etc.


Processes ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 690
Author(s):  
Miguel Ángel Martínez-del-Amor ◽  
David Orellana-Martín ◽  
Ignacio Pérez-Hurtado ◽  
Francis George C. Cabarle ◽  
Henry N. Adorna

To date, parallel simulation algorithms for spiking neural P (SNP) systems are based on a matrix representation. This way, the simulation is implemented with linear algebra operations, which can be easily parallelized on high performance computing platforms such as GPUs. Although it has been convenient for the first generation of GPU-based simulators, such as CuSNP, there are some bottlenecks to sort out. For example, the proposed matrix representations of SNP systems lead to very sparse matrices, where the majority of values are zero. It is known that sparse matrices can compromise the performance of algorithms since they involve a waste of memory and time. This problem has been extensively studied in the literature of parallel computing. In this paper, we analyze some of these ideas and apply them to represent some variants of SNP systems. We also provide a new simulation algorithm based on a novel compressed representation for sparse matrices. We also conclude which SNP system variant better suits our new compressed matrix representation.


Sign in / Sign up

Export Citation Format

Share Document