Parallel Gaussian elimination of symmetric positive definite band matrices for shared-memory multicore architectures

RAIRO - Operations Research ◽

10.1051/ro/2020013 ◽

2020 ◽

Author(s):

Sirine Marrakchi ◽

Mohamed Jemni

Keyword(s):

Shared Memory ◽

Gaussian Elimination ◽

Positive Definite ◽

Parallel Execution ◽

Optimal Time ◽

Multicore Architectures ◽

Start Time ◽

Band Matrices ◽

Symmetric Positive Definite ◽

High Degree

This study presents a new parallel Gaussian elimination approach for symmetric positive definite band systems. For each task, the appropriate start time and adequate processor are determined. Unnecessary dependencies between tasks are eliminated. Simultaneously, all processors perform their associated tasks with precedence constraints under consideration. Our main goal is to obtain a high degree of parallelism by balancing the load of processors and reducing the total idle and parallel execution times. The theoretical lower bounds for parallel execution time and number of processors required to execute the precedence graph at an optimal time are also computed. The validity of our investigation is confirmed by carrying out several experiments on a shared-memory multicore architecture using OpenMP. Practical results prove the efficiency of the proposed method.

Download Full-text

Static Scheduling with Load Balancing for Solving Triangular Band Linear Systems on Multicore Processors

Fundamenta Informaticae ◽

10.3233/fi-2021-2012 ◽

2021 ◽

Vol 179 (1) ◽

pp. 35-58

Author(s):

Sirine Marrakchi ◽

Mohamed Jemni

Keyword(s):

Linear Systems ◽

Multicore Processors ◽

Parallel Execution ◽

Task Graph ◽

Multicore Architectures ◽

Multicore Processor ◽

Start Time ◽

Static Scheduling ◽

Mathematical Formulas ◽

High Degree

A new approach for solving triangular band linear systems is established in this study to balance the load and obtain a high degree of parallelism. Our investigation consists to attribute both adequate start time and processor to each task and eliminate the useless dependencies which are not used in the parallel solve stage. Thereby, processors execute in parallel their related tasks taking account of the considered precedence constraints. The theoretical lower bounds for parallel execution time and the number of processors required to carry out the task graph in the shortest time are determined. Experimentations are realized on a shared-memory multicore processor. The experimental results are fitted to the values derived from the determined mathematical formulas. The comparison of results obtained by our contribution with those from triangular systems resolution routine belonging to the library PLASMA, Parallel Linear Algebra Software for Multicore Architectures, confirms the efficiency of the proposed approach.

Download Full-text

Parallel Execution of Devs in Shared-memory Multicore Architectures

Spring Simulation Conference (SpringSim 2020) ◽

10.22360/springsim.2020.hpc.005 ◽

2020 ◽

Keyword(s):

Shared Memory ◽

Parallel Execution ◽

Multicore Architectures

Download Full-text

Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures

Lecture Notes in Computer Science - High Performance Computing for Computational Science – VECPAR 2010 ◽

10.1007/978-3-642-19328-6_14 ◽

2011 ◽

pp. 129-138 ◽

Cited By ~ 5

Author(s):

Emmanuel Agullo ◽

Henricus Bouwmeester ◽

Jack Dongarra ◽

Jakub Kurzak ◽

Julien Langou ◽

...

Keyword(s):

Positive Definite ◽

Matrix Inversion ◽

Multicore Architectures ◽

Positive Definite Matrices ◽

Symmetric Positive Definite ◽

Tile Matrix ◽

Symmetric Positive Definite Matrices

Download Full-text

Computations with symmetric, positive definite and band matrices on a parallel vector processor

Parallel Computing ◽

10.1016/0167-8191(88)90134-2 ◽

1988 ◽

Vol 8 (1-3) ◽

pp. 301-312 ◽

Cited By ~ 8

Author(s):

Zahari Zlatev ◽

Phuong Vu ◽

Jerzy Wasniewski ◽

Kjeld Schaumburg

Keyword(s):

Positive Definite ◽

Band Matrices ◽

Vector Processor ◽

Symmetric Positive Definite

Download Full-text

A parallel multilevel nested dissection algorithm for shared-memory computing systems

Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie) ◽

10.26089/nummet.v16r339 ◽

2015 ◽

pp. 407-420

Author(s):

А.Ю. Пирова ◽

И.Б. Мееров ◽

Е.А. Козинов ◽

С.А. Лебедев

Keyword(s):

Shared Memory ◽

Sparse Matrix ◽

Memory Systems ◽

Positive Definite ◽

Task Parallelism ◽

Nested Dissection ◽

Computing Systems ◽

Cholesky Factor ◽

Symmetric Positive Definite ◽

Sparse Matrix Ordering

Рассматривается задача переупорядочения строк и столбцов симметричной положительно определенной разреженной матрицы с целью уменьшения числа ненулевых элементов в факторе Холецкого. Данная задача является NP-полной. Для ее решения используются эвристические алгоритмы, основанные на применении методов теории графов. Предлагается параллельный алгоритм переупорядочения для вычислительных систем с общей памятью. В качестве базы для распараллеливания используется модификация многоуровневого метода вложенных сечений, ранее реализованная авторами в виде библиотеки с открытым исходным кодом MORSy. Основная идея распараллеливания заключается в организации и параллельной обработке очереди задач, которые могут быть решены независимо. В отличие от широко распространенных аналогов, применяющих MPI для организации параллелизма как на распределенной, так и на общей памяти, предложенный алгоритм использует возможности стандарта OpenMP 3.0. Вычислительные эксперименты выполнены на симметричных положительно определенных матрицах из коллекции университета Флориды. Показано, что параллельный код MORSy дает сходные или лучшие перестановки в сравнении с библиотекой ParMETIS для всех тестовых матриц, кроме одной, в большинстве случаев опережая ParMETIS по времени работы. Программная реализация выполнена в виде библиотеки с открытым исходным кодом и доступна для скачивания на сайте Приволжского научно-образовательного центра суперкомпьютерных технологий. This paper deals with the NP-complete problem of finding a symmetric positive definite sparse matrix ordering that minimizes the Cholesky factor fill-in. For this purpose, heuristic approaches based on graph algorithms are applied. A new parallel ordering algorithm for shared-memory computing systems is proposed. The modified multilevel nested dissection algorithm from the recently presented MORSy library is used as a basis for ordering. The parallel processing is done in a task-based fashion. It uses the OpenMP 3.0 task parallelism relying on the dynamic load balancing implemented during the OpenMP runtime. The numerical experiments were performed using a number of symmetric positive definite matrices from the University of Florida Sparse Matrix Collection. The experimental results show the competitiveness of the proposed implementation on shared memory systems compared to the widely used ParMETIS library. In our experiments, the parallel MORSy version provides a better ordering than ParMETIS on all but one matrix in terms of the Cholesky factor fill-in and outperforms ParMETIS in most cases. The parallel MORSy version is publicly available from the Supercomputing Center of Lobachevsky State University of Nizhni Novgorod.

Download Full-text

SymNet: A Simple Symmetric Positive Definite Manifold Deep Learning Method for Image Set Classification

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2020.3044176 ◽

2021 ◽

pp. 1-15

Author(s):

Rui Wang ◽

Xiao-Jun Wu ◽

Josef Kittler

Keyword(s):

Deep Learning ◽

Positive Definite ◽

Learning Method ◽

Symmetric Positive Definite ◽

Image Set Classification ◽

Image Set

Download Full-text

Iterative Methods for Linear Equations with Symmetric Positive Definite Matrix

The Computer Journal ◽

10.1093/comjnl/4.3.242 ◽

1961 ◽

Vol 4 (3) ◽

pp. 242-254 ◽

Cited By ~ 19

Author(s):

D. W. Martin

Keyword(s):

Iterative Methods ◽

Linear Equations ◽

Positive Definite Matrix ◽

Positive Definite ◽

Symmetric Positive Definite Matrix ◽

Symmetric Positive Definite

Download Full-text

MSSOR-based alternating direction method for symmetric positive-definite linear complementarity problems

Numerical Algorithms ◽

10.1007/s11075-014-9864-6 ◽

2014 ◽

Vol 68 (3) ◽

pp. 631-644 ◽

Cited By ~ 2

Author(s):

Jian-Jun Zhang

Keyword(s):

Complementarity Problems ◽

Linear Complementarity Problems ◽

Linear Complementarity ◽

Positive Definite ◽

Alternating Direction Method ◽

Symmetric Positive Definite ◽

Alternating Direction

Download Full-text

The Bergman projection of $L\sp \infty$ in tubes over cones of real, symmetric, positive-definite matrices

Transactions of the American Mathematical Society ◽

10.1090/s0002-9947-1986-0846600-3 ◽

1986 ◽

Vol 296 (2) ◽

pp. 621-621

Author(s):

David B{ékoll{é

Keyword(s):

Positive Definite ◽

Bergman Projection ◽

Positive Definite Matrices ◽

Symmetric Positive Definite ◽

Symmetric Positive Definite Matrices

Download Full-text

Parallel Nonnegative Matrix Factorization via Newton Iteration

Parallel Processing Letters ◽

10.1142/s0129626416500146 ◽

2016 ◽

Vol 26 (03) ◽

pp. 1650014 ◽

Cited By ~ 3

Author(s):

Markus Flatz ◽

Marián Vajteršic

Keyword(s):

Shared Memory ◽

Matrix Factorization ◽

Message Passing ◽

Nonnegative Matrix Factorization ◽

Nonnegative Matrix ◽

Newton Iteration ◽

Parallel Execution ◽

Kkt Conditions ◽

Nonnegative Matrices ◽

First Order

The goal of Nonnegative Matrix Factorization (NMF) is to represent a large nonnegative matrix in an approximate way as a product of two significantly smaller nonnegative matrices. This paper shows in detail how an NMF algorithm based on Newton iteration can be derived using the general Karush-Kuhn-Tucker (KKT) conditions for first-order optimality. This algorithm is suited for parallel execution on systems with shared memory and also with message passing. Both versions were implemented and tested, delivering satisfactory speedup results.

Download Full-text