A Strategy for Automatic Performance Tuning of Stencil Computations on GPUs

Scientific Programming ◽

10.1155/2018/6093054 ◽

2018 ◽

Vol 2018 ◽

pp. 1-24 ◽

Cited By ~ 2

Author(s):

Joseph D. Garvey ◽

Tarek S. Abdelrahman

Keyword(s):

Machine Learning ◽

Random Sampling ◽

Graphics Processing Units ◽

Performance Tuning ◽

Stencil Computations ◽

High Performing ◽

Expert Search ◽

Machine Learning Model ◽

Novel Strategy ◽

Graphics Processing

We propose and evaluate a novel strategy for tuning the performance of a class of stencil computations on Graphics Processing Units. The strategy uses a machine learning model to predict the optimal way to load data from memory followed by a heuristic that divides other optimizations into groups and exhaustively explores one group at a time. We use a set of 104 synthetic OpenCL stencil benchmarks that are representative of many real stencil computations. We first demonstrate the need for auto-tuning by showing that the optimization space is sufficiently complex that simple approaches to determining a high-performing configuration fail. We then demonstrate the effectiveness of our approach on NVIDIA and AMD GPUs. Relative to a random sampling of the space, we find configurations that are 12%/32% faster on the NVIDIA/AMD platform in 71% and 4% less time, respectively. Relative to an expert search, we achieve 5% and 9% better performance on the two platforms in 89% and 76% less time. We also evaluate our strategy for different stencil computational intensities, varying array sizes and shapes, and in combination with expert search.

Download Full-text

Active Learning Approaches for Labeling Text: Review and Assessment of the Performance of Active Learning Approaches

Political Analysis ◽

10.1017/pan.2020.4 ◽

2020 ◽

Vol 28 (4) ◽

pp. 532-551

Author(s):

Blake Miller ◽

Fridolin Linder ◽

Walter R. Mebane

Keyword(s):

Machine Learning ◽

Active Learning ◽

Random Sampling ◽

Supervised Machine Learning ◽

Learning Approaches ◽

Simulation Studies ◽

Text Data ◽

Passive Learning ◽

Machine Learning Model ◽

The Cost

Supervised machine learning methods are increasingly employed in political science. Such models require costly manual labeling of documents. In this paper, we introduce active learning, a framework in which data to be labeled by human coders are not chosen at random but rather targeted in such a way that the required amount of data to train a machine learning model can be minimized. We study the benefits of active learning using text data examples. We perform simulation studies that illustrate conditions where active learning can reduce the cost of labeling text data. We perform these simulations on three corpora that vary in size, document length, and domain. We find that in cases where the document class of interest is not balanced, researchers can label a fraction of the documents one would need using random sampling (or “passive” learning) to achieve equally performing classifiers. We further investigate how varying levels of intercoder reliability affect the active learning procedures and find that even with low reliability, active learning performs more efficiently than does random sampling.

Download Full-text

Accelerated FDPS: Algorithms to use accelerators with FDPS

Publications of the Astronomical Society of Japan ◽

10.1093/pasj/psz133 ◽

2020 ◽

Vol 72 (1) ◽

Cited By ~ 2

Author(s):

Masaki Iwasawa ◽

Daisuke Namekata ◽

Keigo Nitadori ◽

Kentaro Nomura ◽

Long Wang ◽

...

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

General Purpose ◽

Performance Model ◽

Performance Tuning ◽

Data Types ◽

Interaction Function ◽

Current Implementation ◽

And Performance ◽

Graphics Processing

Abstract We describe algorithms implemented in FDPS (Framework for Developing Particle Simulators) to make efficient use of accelerator hardware such as GPGPUs (general-purpose computing on graphics processing units). We have developed FDPS to make it possible for researchers to develop their own high-performance parallel particle-based simulation programs without spending large amounts of time on parallelization and performance tuning. FDPS provides a high-performance implementation of parallel algorithms for particle-based simulations in a “generic” form, so that researchers can define their own particle data structure and interparticle interaction functions. FDPS compiled with user-supplied data types and interaction functions provides all the necessary functions for parallelization, and researchers can thus write their programs as though they are writing simple non-parallel code. It has previously been possible to use accelerators with FDPS by writing an interaction function that uses the accelerator. However, the efficiency was limited by the latency and bandwidth of communication between the CPU and the accelerator, and also by the mismatch between the available degree of parallelism of the interaction function and that of the hardware parallelism. We have modified the interface of the user-provided interaction functions so that accelerators are more efficiently used. We also implemented new techniques which reduce the amount of work on the CPU side and the amount of communication between CPU and accelerators. We have measured the performance of N-body simulations on a system with an NVIDIA Volta GPGPU using FDPS and the achieved performance is around 27% of the theoretical peak limit. We have constructed a detailed performance model, and found that the current implementation can achieve good performance on systems with much smaller memory and communication bandwidth. Thus, our implementation will be applicable to future generations of accelerator system.

Download Full-text

The impact of data-complexity and team characteristics on performance in the classification model

International Journal of Business Analytics ◽

10.4018/ijban.288517 ◽

2022 ◽

Vol 9 (1) ◽

pp. 0-0

Keyword(s):

Machine Learning ◽

Binary Classification ◽

Class Imbalance ◽

Predictive Ability ◽

Predictive Performance ◽

Classification Model ◽

Data Complexity ◽

High Performing ◽

Machine Learning Model ◽

The Impact

This article investigates the impact of data-complexity and team-specific characteristics on machine learning competition scores. Data from five real-world binary classification competitions hosted on Kaggle.com were analyzed. The data-complexity characteristics were measured in four aspects including standard measures, sparsity measures, class imbalance measures, and feature-based measures. The results showed that the higher the level of the data-complexity characteristics was, the lower the predictive ability of the machine learning model was as well. Our empirical evidence revealed that the imbalance ratio of the target variable was the most important factor and exhibited a nonlinear relationship with the model’s predictive abilities. The imbalance ratio adversely affected the predictive performance when it reached a certain level. However, mixed results were found for the impact of team-specific characteristics measured by team size, team expertise, and the number of submissions on team performance. For high-performing teams, these factors had no impact on team score.

Download Full-text

CaKernel – A Parallel Application Programming Framework for Heterogenous Computing Architectures

Scientific Programming ◽

10.1155/2011/457030 ◽

2011 ◽

Vol 19 (4) ◽

pp. 185-197 ◽

Cited By ~ 7

Author(s):

Marek Blazewicz ◽

Steven R. Brandt ◽

Michal Kierzynka ◽

Krzysztof Kurowski ◽

Bogdan Ludwiczak ◽

...

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Heterogeneous Computing ◽

Test Case ◽

Stencil Computations ◽

Programming Framework ◽

Problem Solving Environments ◽

Scientific Simulations ◽

Application Programming ◽

Graphics Processing

With the recent advent of new heterogeneous computing architectures there is still a lack of parallel problem solving environments that can help scientists to use easily and efficiently hybrid supercomputers. Many scientific simulations that use structured grids to solve partial differential equations in fact rely on stencil computations. Stencil computations have become crucial in solving many challenging problems in various domains, e.g., engineering or physics. Although many parallel stencil computing approaches have been proposed, in most cases they solve only particular problems. As a result, scientists are struggling when it comes to the subject of implementing a new stencil-based simulation, especially on high performance hybrid supercomputers. In response to the presented need we extend our previous work on a parallel programming framework for CUDA – CaCUDA that now supports OpenCL. We present CaKernel – a tool that simplifies the development of parallel scientific applications on hybrid systems. CaKernel is built on the highly scalable and portable Cactus framework. In the CaKernel framework, Cactus manages the inter-process communication via MPI while CaKernel manages the code running on Graphics Processing Units (GPUs) and interactions between them. As a non-trivial test case we have developed a 3D CFD code to demonstrate the performance and scalability of the automatically generated code.

Download Full-text

MLAir (v1.0) – a tool to enable fast and flexible machine learning on air data time series

10.5194/gmd-2020-332 ◽

2020 ◽

Author(s):

Lukas H. Leufen ◽

Felix Kleinert ◽

Martin G. Schultz

Keyword(s):

Machine Learning ◽

Time Series ◽

Air Quality ◽

Graphics Processing Units ◽

Ease Of Use ◽

Software Environment ◽

Flexible Machine ◽

Code Base ◽

Graphics Processing ◽

Scientific Questions

Abstract. With MLAir (Machine Learning on Air data) we created a software environment that simplifies and accelerates the exploration of new machine learning (ML) models for the analysis and forecasting of meteorological and air quality time series. Thereby MLAir is not developed as an abstract workflow, but hand in hand with actual scientific questions. It thus addresses scientists with either a meteorological or a ML background. Due to their relative ease of use and spectacular results in other application areas, neural networks and other ML methods are gaining enormous momentum also in the weather and air quality research communities. Even though there are already many books and tutorials describing how to conduct a ML experiment, there are many stumbling blocks for a newcomer. In contrast, people familiar with ML concepts and technology often have difficulties understanding the nature of atmospheric data. With MLAir we have addressed a number of these pitfalls so that it becomes easier for scientists of both domains to rapidly start off their ML application. MLAir has been developed in such a way that it is easy to use and is designed from the very beginning as a standalone, fully functional experiment. Due to its flexible, modular code base, code modifications are easy and personal experiment schedules can be quickly derived. The package also includes a set of simple validation tools to facilitate the evaluation of ML results using standard meteorological statistics. MLAir can easily be ported onto different computing environments from desktop workstations to high-end supercomputers with or without graphics processing units (GPU).

Download Full-text

Attention Mechanisms and Their Applications to Complex Systems

Entropy ◽

10.3390/e23030283 ◽

2021 ◽

Vol 23 (3) ◽

pp. 283

Author(s):

Adrián Hernández ◽

José M. Amigó

Keyword(s):

Machine Learning ◽

Complex Systems ◽

Recurrent Neural Networks ◽

Graphics Processing Units ◽

Learning Models ◽

Short Term ◽

Promising Solution ◽

Key Aspects ◽

Graphics Processing ◽

Remarkable Advance

Deep learning models and graphics processing units have completely transformed the field of machine learning. Recurrent neural networks and long short-term memories have been successfully used to model and predict complex systems. However, these classic models do not perform sequential reasoning, a process that guides a task based on perception and memory. In recent years, attention mechanisms have emerged as a promising solution to these problems. In this review, we describe the key aspects of attention mechanisms and some relevant attention techniques and point out why they are a remarkable advance in machine learning. Then, we illustrate some important applications of these techniques in the modeling of complex systems.

Download Full-text

IterML: Iterative Machine Learning for Intelligent Parameter Pruning and Tuning in Graphics Processing Units

Journal of Signal Processing Systems ◽

10.1007/s11265-020-01604-4 ◽

2020 ◽

Author(s):

Xuewen Cui ◽

Wu-chun Feng

Keyword(s):

Machine Learning ◽

Graphics Processing Units ◽

Graphics Processing

Download Full-text

MLAir (v1.0) – a tool to enable fast and flexible machine learning on air data time series

Geoscientific Model Development ◽

10.5194/gmd-14-1553-2021 ◽

2021 ◽

Vol 14 (3) ◽

pp. 1553-1574

Author(s):

Lukas Hubert Leufen ◽

Felix Kleinert ◽

Martin G. Schultz

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Time Series ◽

Air Quality ◽

Graphics Processing Units ◽

Ease Of Use ◽

Software Environment ◽

Flexible Machine ◽

Code Base ◽

Graphics Processing

Abstract. With MLAir (Machine Learning on Air data) we created a software environment that simplifies and accelerates the exploration of new machine learning (ML) models, specifically shallow and deep neural networks, for the analysis and forecasting of meteorological and air quality time series. Thereby MLAir is not developed as an abstract workflow, but hand in hand with actual scientific questions. It thus addresses scientists with either a meteorological or an ML background. Due to their relative ease of use and spectacular results in other application areas, neural networks and other ML methods are also gaining enormous momentum in the weather and air quality research communities. Even though there are already many books and tutorials describing how to conduct an ML experiment, there are many stumbling blocks for a newcomer. In contrast, people familiar with ML concepts and technology often have difficulties understanding the nature of atmospheric data. With MLAir we have addressed a number of these pitfalls so that it becomes easier for scientists of both domains to rapidly start off their ML application. MLAir has been developed in such a way that it is easy to use and is designed from the very beginning as a stand-alone, fully functional experiment. Due to its flexible, modular code base, code modifications are easy and personal experiment schedules can be quickly derived. The package also includes a set of validation tools to facilitate the evaluation of ML results using standard meteorological statistics. MLAir can easily be ported onto different computing environments from desktop workstations to high-end supercomputers with or without graphics processing units (GPUs).

Download Full-text

Predicting the Future-Big Data and Machine Learning

Energies ◽

10.3390/en14238041 ◽

2021 ◽

Vol 14 (23) ◽

pp. 8041

Author(s):

Fernando Sánchez Lasheras

Keyword(s):

Machine Learning ◽

Big Data ◽

Graphics Processing Units ◽

Science And Technology ◽

The Future ◽

Graphics Processing

In recent decades, due to the increase in the capabilities of microprocessors and the advent of graphics processing units (GPUs), the use of machine learning methodologies has become popular in many fields of science and technology [...]

Download Full-text

GPUMLib: A new Library to combine Machine Learning algorithms with Graphics Processing Units

2010 10th International Conference on Hybrid Intelligent Systems ◽

10.1109/his.2010.5600028 ◽

2010 ◽

Cited By ~ 9

Author(s):

Noel Lopes ◽

Bernardete Ribeiro ◽

Ricardo Quintas

Keyword(s):

Machine Learning ◽

Graphics Processing Units ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Combine Machine ◽

Graphics Processing

Download Full-text