LU, QR, and Cholesky factorizations: Programming model, performance analysis and optimization techniques for the Intel Knights Landing Xeon Phi

MILC Code Performance on High End CPU and GPU Supercomputer Clusters

EPJ Web of Conferences ◽

10.1051/epjconf/201817502009 ◽

2018 ◽

Vol 175 ◽

pp. 02009

Author(s):

Carleton DeTar ◽

Steven Gottlieb ◽

Ruizi Li ◽

Doug Toussaint

Keyword(s):

Conjugate Gradient ◽

Memory Hierarchy ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Code Performance ◽

Recent Developments ◽

Knights Landing ◽

Many Core ◽

Intel Xeon

With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.

Download Full-text

Performance Evaluation of Scientific Applications on Intel Xeon Phi Knights Landing Clusters

2018 International Conference on High Performance Computing & Simulation (HPCS) ◽

10.1109/hpcs.2018.00063 ◽

2018 ◽

Cited By ~ 4

Author(s):

Ji-Hoon Kang ◽

Oh-Kyoung Kwon ◽

Hoon Ryu ◽

Jinwoo Jeong ◽

Kyunghun Lim

Keyword(s):

Performance Evaluation ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Scientific Applications ◽

Knights Landing ◽

Intel Xeon

Download Full-text

FDTD model performance analysis for a Cavity Slot Antenna array in a variable geometry conformal test rig

2015 IEEE International Symposium on Antennas and Propagation & USNC/URSI National Radio Science Meeting ◽

10.1109/aps.2015.7305307 ◽

2015 ◽

Cited By ~ 1

Author(s):

Timothy George Pelham ◽

Geoff Hilton ◽

Christopher Railton ◽

Rob Lewis

Keyword(s):

Performance Analysis ◽

Antenna Array ◽

Model Performance ◽

Slot Antenna ◽

Variable Geometry ◽

Test Rig

Download Full-text

Simulating Multiphase Flows in Porous Media Using OpenFOAM on Intel Xeon Phi Knights Landing Processors

Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact - PEARC17 ◽

10.1145/3093338.3093350 ◽

2017 ◽

Cited By ~ 1

Author(s):

Zhi Shang ◽

Honggao Liu

Keyword(s):

Porous Media ◽

Multiphase Flows ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Flows In Porous Media ◽

Knights Landing ◽

Intel Xeon

Download Full-text

OPTIMAL SOLUTION FOR THE DISTRIBUTION OF BY-PRODUCTS IN DISPOSAL UNIT

International Journal of Research -GRANTHAALAYAH ◽

10.29121/granthaalayah.v5.i4rast.2017.3304 ◽

2017 ◽

Vol 5 (4RAST) ◽

pp. 59-63 ◽

Cited By ~ 1

Author(s):

Jyothi P ◽

Vatsala G A ◽

Radha Gupta

Keyword(s):

Waste Disposal ◽

Goal Programming ◽

Programming Model ◽

Optimal Solution ◽

Optimization Techniques ◽

Programming Technique ◽

Emerging Industries ◽

Goal Programming Model ◽

Net Profit ◽

By Products

In present scenario, Waste disposal unit is one of the emerging industries. The process of collection of wastes, segregation of wastes, recycling the wastes and manufacturing by-products and selling the by-products are the major works are undertaken into consideration. Any business expectation is to get the profit. Our study is to formulate goal programming model which helps in maximizing the profit by identifying the deviation of goals in the disposal unit. Goal Programming technique is one of the optimization techniques. Manager of the disposal unit can takes the better decision using the deviation of goals. Pre emptive Goals of the study are (i) minimizing the expenditure of the unit and recycling cost of the wastes ii) boosting the net profit of the unit iii) Maintaining the supply of by-products to each location within the maximum demand iv) Fulfilling demand of by- products in different locations v) Maintaining the minimum supply of recycled by-products to 5 different locations must be at least one.

Download Full-text

Performance Analysis of Brushless DC Motor Using Modified Queen Bee Evolution Based Genetic Algorithm Tuned PI Controller under Different Speed Conditions

Advances in Research ◽

10.9734/air/2020/v21i230183 ◽

2020 ◽

pp. 1-10

Author(s):

Ankit Rawat ◽

Mohd Fazle Azeem

Keyword(s):

Performance Analysis ◽

High Efficiency ◽

Mechanical Load ◽

Model Performance ◽

Bldc Motor ◽

Brushless Dc ◽

Compact Size ◽

And Performance ◽

Domestic Appliance ◽

Motor Drive System

The modeling of BLDC motor and performance analysis under diverse operating speed settings has been presented in this paper. BLDC motors gaining more & more attention from different Industrial and domestic appliance manufacturers due to its compact size, high efficiency and robust structure. Voluminous research and developments in the domains of material science and power electronics led to substantial increase in applications of BLDC motor to electric drives. This paper deals with the modeling of BLDC motor drive system along with a comparative study of modified queens bee evolution based GA tuned & manually tuned control schemes using MATLAB /SIMULINK. In order to evaluate the performance of proposed drive, simulation is carried out at different Mechanical load & speed conditions. Test outcomes thus achieved show that the model performance is satisfactory.

Download Full-text

Mutual Fund Performance Analysis Using Nature Inspired Optimization Techniques: A Critical Review

Advances in Intelligent Systems and Computing - Advances in Intelligent Systems and Interactive Applications ◽

10.1007/978-3-319-69096-4_104 ◽

2017 ◽

pp. 734-745

Author(s):

Zeenat Afroz ◽

Smruti Rekha Das ◽

Debahuti Mishra ◽

Srikanta Patnaik

Keyword(s):

Performance Analysis ◽

Mutual Fund ◽

Critical Review ◽

Fund Performance ◽

Optimization Techniques ◽

Mutual Fund Performance

Download Full-text

Long-time simulations with complex code using multiple nodes of Intel Xeon Phi Knights Landing

Journal of Computational and Applied Mathematics ◽

10.1016/j.cam.2017.12.050 ◽

2018 ◽

Vol 337 ◽

pp. 18-36 ◽

Cited By ~ 1

Author(s):

Jonathan S. Graf ◽

Matthias K. Gobbert ◽

Samuel Khuvis

Keyword(s):

Xeon Phi ◽

Intel Xeon Phi ◽

Long Time ◽

Knights Landing ◽

Intel Xeon

Download Full-text

Comparative Performance Analysis of Optimization Techniques on Vector Quantization for Image Compression

International Journal of Computer Vision and Image Processing ◽

10.4018/ijcvip.2017010102 ◽

2017 ◽

Vol 7 (1) ◽

pp. 19-43 ◽

Cited By ~ 2

Author(s):

Karri Chiranjeevi ◽

Umaranjan Jena ◽

Sonali Dash

Keyword(s):

Performance Analysis ◽

Image Compression ◽

Vector Quantization ◽

Signal To Noise Ratio ◽

Bat Algorithm ◽

Cuckoo Search ◽

Optimization Techniques ◽

Convergence Time ◽

Comparative Performance

Linde-Buzo-Gray (LBG) Vector Quantization (VQ), technically generates local codebook after many runs on different sets of training images for image compression. The key role of VQ is to generate global codebook. In this paper, we present comparative performance analysis of different optimization techniques. Firefly and Cuckoo search generate a near global codebook, but undergoes problem when non-availability of brighter fireflies and convergence time is very high respectively. Hybrid Cuckoo Search (HCS) algorithm was developed and tested on four benchmark functions, that optimizes the LBG codebook with less convergence rate by taking McCulloch's algorithm based levy flight and variant of searching parameters. Practically, we observed that Bat algorithm (BA) peak signal to noise ratio is better than LBG, FA, CS and HCS in between 8 to 256 codebook sizes. The convergence time of BA is 2.4452, 2.734 and 1.5126 times faster than HCS, CS and FA respectively.

Download Full-text

Executing linear algebra kernels in heterogeneous distributed infrastructures with PyCOMPSs

Oil & Gas Science and Technology – Revue d’IFP Energies nouvelles ◽

10.2516/ogst/2018047 ◽

2018 ◽

Vol 73 ◽

pp. 47 ◽

Cited By ~ 3

Author(s):

Ramon Amela ◽

Cristian Ramon-Cortes ◽

Jorge Ejarque ◽

Javier Conejero ◽

Rosa M. Badia

Keyword(s):

Programming Languages ◽

Linear Algebra ◽

Programming Model ◽

Xeon Phi ◽

Scientific Communities ◽

Heterogeneous Architectures ◽

Parallel Programming Model ◽

Significant Performance ◽

Thread Level Parallelism ◽

Level Parallelism

Python is a popular programming language due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. The adoption from multiple scientific communities has evolved in the emergence of a large number of libraries and modules, which has helped to put Python on the top of the list of the programming languages [1]. Task-based programming has been proposed in the recent years as an alternative parallel programming model. PyCOMPSs follows such approach for Python, and this paper presents its extensions to combine task-based parallelism and thread-level parallelism. Also, we present how PyCOMPSs has been adapted to support heterogeneous architectures, including Xeon Phi and GPUs. Results obtained with linear algebra benchmarks demonstrate that significant performance can be obtained with a few lines of Python.

Download Full-text