Irregular Computations in Fortran – Expression and Implementation Strategies

Jan F. Prins; Siddhartha Chatterjee; Martin Simons

doi:10.1155/1999/607659

Irregular Computations in Fortran – Expression and Implementation Strategies

Scientific Programming ◽

10.1155/1999/607659 ◽

1999 ◽

Vol 7 (3-4) ◽

pp. 313-326 ◽

Cited By ~ 1

Author(s):

Jan F. Prins ◽

Siddhartha Chatterjee ◽

Martin Simons

Keyword(s):

Programming Languages ◽

High Performance ◽

Sparse Matrix ◽

Implementation Strategies ◽

Data Parallel ◽

Nested Data ◽

And Performance ◽

High Performance Computers ◽

Matrix Vector ◽

Irregular Computations

Modern dialects of Fortran enjoy wide use and good support on high‐performance computers as performance‐oriented programming languages. By providing the ability to express nested data parallelism, modern Fortran dialects enable irregular computations to be incorporated into existing applications with minimal rewriting and without sacrificing performance within the regular portions of the application. Since performance of nested data‐parallel computation is unpredictable and often poor using current compilers, we investigatethreadingandflattening, two source‐to‐source transformation techniques that can improve performance and performance stability. For experimental validation of these techniques, we explore nested data‐parallel implementations of the sparse matrix‐vector product and the Barnes–Hut n‐body algorithm by hand‐coding thread‐based (using OpenMP directives) and flattening‐based versions of these algorithms and evaluating their performance on an SGI Origin 2000 and an NEC SX‐4, two shared‐memory machines.

Download Full-text

On the Memory Wall and Performance of Symmetric Sparse Matrix Vector Multiplications In Different Data Structures on Shared Memory Machines

2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom) ◽

10.1109/uic-atc-scalcom-cbdcom-iop.2015.259 ◽

2015 ◽

Cited By ~ 1

Author(s):

Tongxiang Gu ◽

Xingping Liu ◽

Zeyao Mo ◽

Xiaowen Xu ◽

Shengxin Zhu

Keyword(s):

Shared Memory ◽

Data Structures ◽

Sparse Matrix ◽

Memory Wall ◽

And Performance ◽

Matrix Vector

Download Full-text

Efficient Graph Component Labeling on Hybrid CPU and GPU Platforms

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.596.276 ◽

2014 ◽

Vol 596 ◽

pp. 276-279

Author(s):

Xiao Hui Pan

Keyword(s):

High Performance ◽

General Purpose ◽

Gpu Programming ◽

Data Parallel ◽

Graphical Processing Units ◽

Architectural Features ◽

Graph Coloring Problem ◽

Graphical Processing ◽

And Performance ◽

Performance Results

Graph component labeling, which is a subset of the general graph coloring problem, is a computationally expensive operation in many important applications and simulations. A number of data-parallel algorithmic variations to the component labeling problem are possible and we explore their use with general purpose graphical processing units (GPGPUs) and with the CUDA GPU programming language. We discuss implementation issues and performance results on CPUs and GPUs using CUDA. We evaluated our system with real-world graphs. We show how to consider different architectural features of the GPU and the host CPUs and achieve high performance.

Download Full-text

Architecture and Performance Characteristics of Modern High Performance Computers

Computational Many-Particle Physics - Lecture Notes in Physics ◽

10.1007/978-3-540-74686-7_26 ◽

2007 ◽

pp. 681-730 ◽

Cited By ~ 1

Author(s):

Georg Hager ◽

Gerhard Wellein

Keyword(s):

High Performance ◽

Performance Characteristics ◽

And Performance ◽

High Performance Computers

Download Full-text

AN ASSOCIATIVE DATA PARALLEL COMPILATION MODEL FOR TIGHT INTEGRATION OF HIGH PERFORMANCE KNOWLEDGE RETRIEVAL AND COMPUTING

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213094000078 ◽

1994 ◽

Vol 03 (01) ◽

pp. 97-125 ◽

Cited By ~ 3

Author(s):

ARVIND K. BANSAL

Keyword(s):

Performance Evaluation ◽

High Performance ◽

Loose Coupling ◽

Abstract Machine ◽

Data Movement ◽

Left Hand ◽

Low Level ◽

Data Parallel ◽

Data Alignment ◽

And Performance

Associative Computation is characterized by intertwining of search by content and data parallel computation. An algebra for associative computation is described. A compilation based model and a novel abstract machine for associative logic programming are presented. The model uses loose coupling of left hand side of the program, treated as data, and right hand side of the program, treated as low level code. This representation achieves efficiency by associative computation and data alignment during goal reduction and during execution of low level abstract instructions. Data alignment reduces the overhead of data movement. Novel schemes for associative manipulation of aliased uninstantiated variables, data parallel goal reduction in the presence multiple occurrences of the same variables in a goal. The architecture, behavior, and performance evaluation of the model are presented.

Download Full-text

A high performance algorithm using pre-processing for the sparse matrix-vector multiplication

Proceedings Supercomputing '92 ◽

10.1109/superc.1992.236712 ◽

2003 ◽

Cited By ~ 13

Author(s):

R.C. Agarwal ◽

F.G. Gustavson ◽

M. Zubair

Keyword(s):

High Performance ◽

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Matrix Vector

Download Full-text

BASIC SPARSE MATRIX COMPUTATIONS ON THE CM-5

International Journal of Modern Physics C ◽

10.1142/s0129183193000082 ◽

1993 ◽

Vol 04 (01) ◽

pp. 65-83 ◽

Cited By ~ 4

Author(s):

SERGE PETITON ◽

YOUCEF SAAD ◽

KESHENG WU ◽

WILLIAM FERNG

Keyword(s):

Experimental Study ◽

Data Structures ◽

Sparse Matrix ◽

Sparse Matrices ◽

Matrix Computations ◽

Data Parallel ◽

Matrix Vector

This paper presents a preliminary experimental study of the performance of basic sparse matrix computations on the CM-5. We concentrate on examining various ways of performing general sparse matrix-vector operations and the basic primitives on which these are based. We compare various data structures for storing sparse matrices and their corresponding matrix — vector operations. Both SPMD and Data parallel modes are examined and a comparison of the two modes is made.

Download Full-text

High performance sparse matrix-vector multiplication on FPGA

IEICE Electronics Express ◽

10.1587/elex.10.20130529 ◽

2013 ◽

Vol 10 (17) ◽

pp. 20130529-20130529 ◽

Cited By ~ 6

Author(s):

Dan Zou ◽

Yong Dou ◽

Song Guo ◽

Shice Ni

Keyword(s):

High Performance ◽

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Matrix Vector

Download Full-text

Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016646844 ◽

2016 ◽

Vol 32 (2) ◽

pp. 220-230 ◽

Cited By ~ 3

Author(s):

Hartwig Anzt ◽

Moritz Kreutzer ◽

Eduardo Ponce ◽

Gregory D Peterson ◽

Gerhard Wellein ◽

...

Keyword(s):

Performance Evaluation ◽

Sparse Matrix ◽

Data Locality ◽

Performance Model ◽

Reduction Algorithm ◽

Comprehensive Performance ◽

And Performance ◽

Matrix Vector ◽

Gpu Implementation ◽

Optimization And Performance

In this paper, we present an optimized GPU implementation for the induced dimension reduction algorithm. We improve data locality, combine it with an efficient sparse matrix vector kernel, and investigate the potential of overlapping computation with communication as well as the possibility of concurrent kernel execution. A comprehensive performance evaluation is conducted using a suitable performance model. The analysis reveals efficiency of up to 90%, which indicates that the implementation achieves performance close to the theoretically attainable bound.

Download Full-text

A Versatile PC-Based Method for the Processing of the Large Matrices

Volume 2: 19th Computers and Information in Engineering Conference ◽

10.1115/detc99/cie-9059 ◽

1999 ◽

Author(s):

Emil M. Oanta ◽

Bogdan Nicolescu

Keyword(s):

Programming Languages ◽

Computer Aided Design ◽

High Performance ◽

Computational Mechanics ◽

Experimental Mechanics ◽

Current Application ◽

Software Applications ◽

The Matrix ◽

Aided Design ◽

High Performance Computers

Abstract In the paper we present a general solution for handling the large matrices. This solution is general because of the wide use of the matrix based approach in problems concerning numerical methods, experimental mechanics, computational mechanics, CFD, computer aided design, economical problems, etc. Some of the major advantages of this solution are: 1. lack of requirements regarding the use of some high-performance computers, the constraints being connected to the size of the hard-disk (at present increasing and being cheaper); 2. Windows operating system may be used but it is not absolutely necessary; 3. it represents an interface between the programming languages; 4. it can easily be used for development of multi-language software applications; 5. applicability in all the domains which use, at the logical level, ‘matrices’ (mathematics, engineering, economics); 6. there are no constraints regarding the use of ‘classic’ solutions techniques; 7. it is easy to implement in software applications already written; 8. data type used as interface may easily be modified in order to be adapted in an optimum way to the current application to be developed. By the use of this solution we solved a series of computer problems and ‘dedicated’ applications in some areas like: Mathematics, Experimental Mechanics, Computational Mechanics, Physics, etc.

Download Full-text

Simulation of Spiking Neural P Systems with Sparse Matrix-Vector Operations

Processes ◽

10.3390/pr9040690 ◽

2021 ◽

Vol 9 (4) ◽

pp. 690

Author(s):

Miguel Ángel Martínez-del-Amor ◽

David Orellana-Martín ◽

Ignacio Pérez-Hurtado ◽

Francis George C. Cabarle ◽

Henry N. Adorna

Keyword(s):

High Performance ◽

First Generation ◽

Matrix Representation ◽

Sparse Matrix ◽

Sparse Matrices ◽

Matrix Representations ◽

Spiking Neural P Systems ◽

Computing Platforms ◽

Matrix Vector ◽

Performance Computing

To date, parallel simulation algorithms for spiking neural P (SNP) systems are based on a matrix representation. This way, the simulation is implemented with linear algebra operations, which can be easily parallelized on high performance computing platforms such as GPUs. Although it has been convenient for the first generation of GPU-based simulators, such as CuSNP, there are some bottlenecks to sort out. For example, the proposed matrix representations of SNP systems lead to very sparse matrices, where the majority of values are zero. It is known that sparse matrices can compromise the performance of algorithms since they involve a waste of memory and time. This problem has been extensively studied in the literature of parallel computing. In this paper, we analyze some of these ideas and apply them to represent some variants of SNP systems. We also provide a new simulation algorithm based on a novel compressed representation for sparse matrices. We also conclude which SNP system variant better suits our new compressed matrix representation.

Download Full-text