scholarly journals Parallel Gaussian elimination of symmetric positive definite band matrices for shared-memory multicore architectures

Author(s):  
Sirine Marrakchi ◽  
Mohamed Jemni

This study presents a new parallel Gaussian elimination approach for symmetric positive definite band systems. For each task, the appropriate start time and adequate processor are determined. Unnecessary dependencies between tasks are eliminated. Simultaneously, all processors perform their associated tasks with precedence constraints under consideration. Our main goal is to obtain a high degree of parallelism by balancing the load of processors and reducing the total idle and parallel execution times. The theoretical lower bounds for parallel execution time and number of processors required to execute the precedence graph at an optimal time are also computed. The validity of our investigation is confirmed by carrying out several experiments on a shared-memory multicore architecture using OpenMP. Practical results prove the efficiency of the proposed method.

2021 ◽  
Vol 179 (1) ◽  
pp. 35-58
Author(s):  
Sirine Marrakchi ◽  
Mohamed Jemni

A new approach for solving triangular band linear systems is established in this study to balance the load and obtain a high degree of parallelism. Our investigation consists to attribute both adequate start time and processor to each task and eliminate the useless dependencies which are not used in the parallel solve stage. Thereby, processors execute in parallel their related tasks taking account of the considered precedence constraints. The theoretical lower bounds for parallel execution time and the number of processors required to carry out the task graph in the shortest time are determined. Experimentations are realized on a shared-memory multicore processor. The experimental results are fitted to the values derived from the determined mathematical formulas. The comparison of results obtained by our contribution with those from triangular systems resolution routine belonging to the library PLASMA, Parallel Linear Algebra Software for Multicore Architectures, confirms the efficiency of the proposed approach.


1988 ◽  
Vol 8 (1-3) ◽  
pp. 301-312 ◽  
Author(s):  
Zahari Zlatev ◽  
Phuong Vu ◽  
Jerzy Wasniewski ◽  
Kjeld Schaumburg

Author(s):  
А.Ю. Пирова ◽  
И.Б. Мееров ◽  
Е.А. Козинов ◽  
С.А. Лебедев

Рассматривается задача переупорядочения строк и столбцов симметричной положительно определенной разреженной матрицы с целью уменьшения числа ненулевых элементов в факторе Холецкого. Данная задача является NP-полной. Для ее решения используются эвристические алгоритмы, основанные на применении методов теории графов. Предлагается параллельный алгоритм переупорядочения для вычислительных систем с общей памятью. В качестве базы для распараллеливания используется модификация многоуровневого метода вложенных сечений, ранее реализованная авторами в виде библиотеки с открытым исходным кодом MORSy. Основная идея распараллеливания заключается в организации и параллельной обработке очереди задач, которые могут быть решены независимо. В отличие от широко распространенных аналогов, применяющих MPI для организации параллелизма как на распределенной, так и на общей памяти, предложенный алгоритм использует возможности стандарта OpenMP 3.0. Вычислительные эксперименты выполнены на симметричных положительно определенных матрицах из коллекции университета Флориды. Показано, что параллельный код MORSy дает сходные или лучшие перестановки в сравнении с библиотекой ParMETIS для всех тестовых матриц, кроме одной, в большинстве случаев опережая ParMETIS по времени работы. Программная реализация выполнена в виде библиотеки с открытым исходным кодом и доступна для скачивания на сайте Приволжского научно-образовательного центра суперкомпьютерных технологий. This paper deals with the NP-complete problem of finding a symmetric positive definite sparse matrix ordering that minimizes the Cholesky factor fill-in. For this purpose, heuristic approaches based on graph algorithms are applied. A new parallel ordering algorithm for shared-memory computing systems is proposed. The modified multilevel nested dissection algorithm from the recently presented MORSy library is used as a basis for ordering. The parallel processing is done in a task-based fashion. It uses the OpenMP 3.0 task parallelism relying on the dynamic load balancing implemented during the OpenMP runtime. The numerical experiments were performed using a number of symmetric positive definite matrices from the University of Florida Sparse Matrix Collection. The experimental results show the competitiveness of the proposed implementation on shared memory systems compared to the widely used ParMETIS library. In our experiments, the parallel MORSy version provides a better ordering than ParMETIS on all but one matrix in terms of the Cholesky factor fill-in and outperforms ParMETIS in most cases. The parallel MORSy version is publicly available from the Supercomputing Center of Lobachevsky State University of Nizhni Novgorod.


2016 ◽  
Vol 26 (03) ◽  
pp. 1650014 ◽  
Author(s):  
Markus Flatz ◽  
Marián Vajteršic

The goal of Nonnegative Matrix Factorization (NMF) is to represent a large nonnegative matrix in an approximate way as a product of two significantly smaller nonnegative matrices. This paper shows in detail how an NMF algorithm based on Newton iteration can be derived using the general Karush-Kuhn-Tucker (KKT) conditions for first-order optimality. This algorithm is suited for parallel execution on systems with shared memory and also with message passing. Both versions were implemented and tested, delivering satisfactory speedup results.


Sign in / Sign up

Export Citation Format

Share Document