Fine-Grained Parallel Solution for Solving Sparse Triangular Systems on Multicore Platform Using OpenMP Interface

Author(s):  
Sirine Marrakchi ◽  
Mohamed Jemni
2014 ◽  
Vol 602-605 ◽  
pp. 3751-3754
Author(s):  
Yu Liu ◽  
Yi Xiao

In order to improve the efficiency of magnetotelluric Occam inversion algorithm (MT Occam), a parallel algorithm is implemented on a hybrid MPI/OpenMP parallel programming model to increase its convergence speed and to decrease the operation time. MT Occam is partitioned to map the task on the parallel model. The parallel algorithm implements the coarse-grained parallelism between computation nodes and fine-grained parallelism between cores within each node. By analyzing the data dependency, the computing tasks are accurately partitioned so as to reduce transmission time. The experimental results show that with the increase of model scale, higher speedup can be obtained. The high efficiency of the parallel partitioning strategy of the model can improve the scalability of the parallel algorithm.


1993 ◽  
Vol 14 (2) ◽  
pp. 446-460 ◽  
Author(s):  
Fernando L. Alvarado ◽  
Robert Schreiber

Author(s):  
Toby Heyn ◽  
Alessandro Tasora ◽  
Mihai Anitescu ◽  
Dan Negrut

This paper describes a numerical method for the parallel solution of the differential measure inclusion problem posed by mechanical multibody systems containing bilateral and unilateral frictional constraints. The method proposed has been implemented as a set of parallel algorithms leveraging NVIDIA’s Compute Unified Device Architecture (CUDA) library support for multi-core stream computing. This allows the proposed solution to run on a wide variety of GeForce and TESLA NVIDIA graphics cards for high performance computing. Although the methodology relies on the solution of cone complementarity problems known to be fine-grained in terms of data dependency, a suitable approach has been developed to exploit parallelism with low overhead in terms of memory access and thread synchronization. Additionally, a parallel collision detection algorithm has been incorporated to further exploit available parallelism. Initial numerical tests described in this paper demonstrate a speedup of one order of magnitude for the solution time of both the collision detection and the cone complementarity problems when performed in parallel. Since stream multiprocessors are becoming ubiquitous as embedded components of next-generation graphic boards, the solution proposed represents a cost-efficient way to simulate the time evolution of complex mechanical problems with millions of parts and constraints, a task that used to require powerful supercomputers. The proposed methodology facilitates the analysis of extremely complex systems such as granular material flows and off-road vehicle dynamics.


1988 ◽  
Vol 6 (1) ◽  
pp. 109-114 ◽  
Author(s):  
Charles H Romine ◽  
James M Ortega

Sign in / Sign up

Export Citation Format

Share Document