LU, QR, and Cholesky factorizations: Programming model, performance analysis and optimization techniques for the Intel Knights Landing Xeon Phi

Author(s):  
Azzam Haidar ◽  
Stanimire Tomov ◽  
Konstantin Arturov ◽  
Murat Guney ◽  
Shane Story ◽  
...  
2018 ◽  
Vol 175 ◽  
pp. 02009
Author(s):  
Carleton DeTar ◽  
Steven Gottlieb ◽  
Ruizi Li ◽  
Doug Toussaint

With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.


2017 ◽  
Vol 5 (4RAST) ◽  
pp. 59-63 ◽  
Author(s):  
Jyothi P ◽  
Vatsala G A ◽  
Radha Gupta

In present scenario, Waste disposal unit is one of the emerging industries. The process of collection of wastes, segregation of wastes, recycling the wastes and manufacturing by-products and selling the by-products are the major works are undertaken into consideration.  Any business expectation is to get the profit.  Our study is to formulate goal programming model which helps in maximizing the profit by identifying the deviation of goals in the disposal unit. Goal Programming technique is one of the optimization techniques. Manager of the disposal unit can takes the better decision using the deviation of goals. Pre emptive Goals of the study are (i) minimizing the expenditure of the unit and recycling cost of the wastes ii) boosting the net profit of the unit    iii) Maintaining the supply of by-products to each location within the maximum demand iv) Fulfilling demand of by- products in different locations v) Maintaining the minimum supply of recycled by-products to 5 different locations must be at least one.


2020 ◽  
pp. 1-10
Author(s):  
Ankit Rawat ◽  
Mohd Fazle Azeem

The modeling of BLDC motor and performance analysis under diverse operating speed settings has been presented in this paper. BLDC motors gaining more & more attention from different Industrial and domestic appliance manufacturers due to its compact size, high efficiency and robust structure. Voluminous research and developments in the domains of material science and power electronics led to substantial increase in applications of BLDC motor to electric drives. This paper deals with the modeling of BLDC motor drive system along with a comparative study of modified queens bee evolution based GA tuned & manually tuned control schemes using MATLAB /SIMULINK. In order to evaluate the performance of proposed drive, simulation is carried out at different Mechanical load & speed conditions. Test outcomes thus achieved show that the model performance is satisfactory.


Author(s):  
Karri Chiranjeevi ◽  
Umaranjan Jena ◽  
Sonali Dash

Linde-Buzo-Gray (LBG) Vector Quantization (VQ), technically generates local codebook after many runs on different sets of training images for image compression. The key role of VQ is to generate global codebook. In this paper, we present comparative performance analysis of different optimization techniques. Firefly and Cuckoo search generate a near global codebook, but undergoes problem when non-availability of brighter fireflies and convergence time is very high respectively. Hybrid Cuckoo Search (HCS) algorithm was developed and tested on four benchmark functions, that optimizes the LBG codebook with less convergence rate by taking McCulloch's algorithm based levy flight and variant of searching parameters. Practically, we observed that Bat algorithm (BA) peak signal to noise ratio is better than LBG, FA, CS and HCS in between 8 to 256 codebook sizes. The convergence time of BA is 2.4452, 2.734 and 1.5126 times faster than HCS, CS and FA respectively.


Author(s):  
Ramon Amela ◽  
Cristian Ramon-Cortes ◽  
Jorge Ejarque ◽  
Javier Conejero ◽  
Rosa M. Badia

Python is a popular programming language due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. The adoption from multiple scientific communities has evolved in the emergence of a large number of libraries and modules, which has helped to put Python on the top of the list of the programming languages [1]. Task-based programming has been proposed in the recent years as an alternative parallel programming model. PyCOMPSs follows such approach for Python, and this paper presents its extensions to combine task-based parallelism and thread-level parallelism. Also, we present how PyCOMPSs has been adapted to support heterogeneous architectures, including Xeon Phi and GPUs. Results obtained with linear algebra benchmarks demonstrate that significant performance can be obtained with a few lines of Python.


Sign in / Sign up

Export Citation Format

Share Document