Training Large Scale Deep Neural Networks on the Intel Xeon Phi Many-Core Coprocessor

With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.

Download Full-text

Cost-efficiency of Large-scale Electronic Structure Simulations with Intel Xeon Phi Processors

2019 IEEE International Conference on Cluster Computing (CLUSTER) ◽

10.1109/cluster.2019.8891046 ◽

2019 ◽

Author(s):

Hoon Ryu ◽

Seungmin Lee

Keyword(s):

Electronic Structure ◽

Cost Efficiency ◽

Large Scale ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Intel Xeon

Download Full-text

Parallelization of Molecular-Dynamics Simulations Using Tasks

MRS Proceedings ◽

10.1557/opl.2015.113 ◽

2015 ◽

Vol 1753 ◽

Cited By ~ 2

Author(s):

Ralf Meyer ◽

Chris M. Mangiardi

Keyword(s):

Molecular Dynamics ◽

Molecular Dynamics Simulations ◽

Shared Memory ◽

Md Simulations ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Novel Algorithms ◽

Dynamics Simulations ◽

Many Core ◽

Intel Xeon

ABSTRACTThis article discusses novel algorithms for molecular-dynamics (MD) simulations with short-ranged forces on modern multi- and many-core processors like the Intel Xeon Phi. A task-based approach to the parallelization of MD on shared-memory computers and a tiling scheme to facilitate the SIMD vectorization of the force calculations is described. The algorithms have been tested with three different potentials and the resulting speed-ups on Intel Xeon Phi coprocessors are shown.

Download Full-text

Implementation of an Agent-Based Parallel Tissue Modelling Framework for the Intel MIC Architecture

Scientific Programming ◽

10.1155/2017/8721612 ◽

2017 ◽

Vol 2017 ◽

pp. 1-11 ◽

Cited By ~ 5

Author(s):

Maciej Cytowski ◽

Zuzanna Szymańska ◽

Piotr Umiński ◽

Grzegorz Andrejczuk ◽

Krzysztof Raszkowski

Keyword(s):

High Performance ◽

Large Scale ◽

Spatial Scales ◽

Biological Processes ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Variable Environment ◽

Modelling Framework ◽

Computational Performance ◽

Intel Xeon

Timothy is a novel large scale modelling framework that allows simulating of biological processes involving different cellular colonies growing and interacting with variable environment. Timothy was designed for execution on massively parallel High Performance Computing (HPC) systems. The high parallel scalability of the implementation allows for simulations of up to 109 individual cells (i.e., simulations at tissue spatial scales of up to 1 cm3 in size). With the recent advancements of the Timothy model, it has become critical to ensure appropriate performance level on emerging HPC architectures. For instance, the introduction of blood vessels supplying nutrients to the tissue is a very important step towards realistic simulations of complex biological processes, but it greatly increased the computational complexity of the model. In this paper, we describe the process of modernization of the application in order to achieve high computational performance on HPC hybrid systems based on modern Intel® MIC architecture. Experimental results on the Intel Xeon Phi™ coprocessor x100 and the Intel Xeon Phi processor x200 are presented.

Download Full-text

Many-core needs fine-grained scheduling: A case study of query processing on Intel Xeon Phi processors

Journal of Parallel and Distributed Computing ◽

10.1016/j.jpdc.2017.09.005 ◽

2018 ◽

Vol 120 ◽

pp. 395-404 ◽

Cited By ~ 3

Author(s):

Xuntao Cheng ◽

Bingsheng He ◽

Mian Lu ◽

Chiew Tong Lau

Keyword(s):

Query Processing ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Fine Grained ◽

Many Core ◽

Intel Xeon

Download Full-text

A parallel algorithm of Euclidean distance matrix computation for the Intel Xeon Phi Knights Landing many-core processor

Bulletin of the South Ural State University Series Computational Mathematics and Software Engineering ◽

10.14529/cmse180305 ◽

2018 ◽

Vol 7 (3) ◽

Keyword(s):

Parallel Algorithm ◽

Euclidean Distance ◽

Distance Matrix ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Euclidean Distance Matrix ◽

Matrix Computation ◽

Knights Landing ◽

Many Core ◽

Intel Xeon

Download Full-text

Performance Evaluation of an OpenCL Implementation of the Lattice Boltzmann Method on the Intel Xeon Phi

Parallel Processing Letters ◽

10.1142/s0129626415410017 ◽

2015 ◽

Vol 25 (03) ◽

pp. 1541001 ◽

Cited By ~ 1

Author(s):

Christian Obrecht ◽

Bernard Tourancheau ◽

Frédéric Kuznik

Keyword(s):

Lattice Boltzmann Method ◽

Lattice Boltzmann ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Hardware Architectures ◽

Nvidia Gpu ◽

Many Core ◽

Hardware Platforms ◽

Boltzmann Method ◽

Intel Xeon

A portable OpenCL implementation of the lattice Boltzmann method targeting emerging many-core architectures is described. The main purpose of this work is to evaluate and compare the performance of this code on three mainstream hardware architectures available today, namely an Intel CPU, an Nvidia GPU, and the Intel Xeon Phi. Because of the similarities between OpenCL and CUDA, we chose to follow some of the strategies devised to implement efficient lattice Boltzmann solvers on Nvidia GPU, while remaining as generic as possible. Being fairly configurable, this program makes possible to ascertain the best options for each hardware platforms. The achieved performance is quite satisfactory for both the CPU and the GPU. For the Xeon Phi however, the results are below expectations. Nevertheless, comparison with data from the literature shows that on this architecture the code seems memory-bound.

Download Full-text

Evaluating the Support of MTC Applications on Intel Xeon Phi Many-Core Accelerators

2015 IEEE International Conference on Cluster Computing ◽

10.1109/cluster.2015.87 ◽

2015 ◽

Author(s):

Poornima Nookala ◽

Serapheim Dimitropoulos ◽

Karl Stough ◽

Ioan Raicu

Keyword(s):

Xeon Phi ◽

Intel Xeon Phi ◽

Many Core ◽

Intel Xeon

Download Full-text