scholarly journals Parallelization of an Unsteady ALE Solver with Deforming Mesh Using OpenACC

2017 ◽  
Vol 2017 ◽  
pp. 1-16 ◽  
Author(s):  
Wenpeng Ma ◽  
Zhonghua Lu ◽  
Wu Yuan ◽  
Xiaodong Hu

This paper presents a parallel, GPU-based, deforming mesh-enabled unsteady numerical solver for solving moving body problems by using OpenACC. Both the 2D and 3D parallel algorithms based on spring-like deforming mesh methods are proposed and then implemented through OpenACC programming model. Furthermore, these algorithms are coupled with an unstructured mesh based, implicit time scheme integrated numerical solver, which makes the full GPU version of the solver capable of handling unsteady calculations with deforming mesh. Experiments results show that the proposed parallel deforming mesh algorithm can achieve over 2.5x speedup on K20 GPU card in comparison with 20 OpenMP threads on Intel E5-2658 V2 CPU cores. And both 2D and 3D cases are conducted to validate the efficiency, correctness, and accuracy of the present solver.

2013 ◽  
Vol 2013 ◽  
pp. 1-16 ◽  
Author(s):  
Marzio Sala ◽  
Pénélope Leyland ◽  
Angelo Casagrande

A parallel adaptive pseudo transient Newton-Krylov-Schwarz (αΨNKS) method for the solution of compressible flows is presented. Multidimensional upwind residual distribution schemes are used for space discretisation, while an implicit time-marching scheme is employed for the discretisation of the (pseudo)time derivative. The linear system arising from the Newton method applied to the resulting nonlinear system is solved by the means of Krylov iterations with Schwarz-type preconditioners. A scalable and efficient data structure for theαΨNKS procedure is presented. The main computational kernels are considered, and an extensive analysis is reported to compare the Krylov accelerators, the preconditioning techniques. Results, obtained on a distributed memory computer, are presented for 2D and 3D problems of aeronautical interest on unstructured grids.


Author(s):  
A. L. Sayeth Saabith ◽  
Elankovan Sundararajan ◽  
Azuraliza Abu Bakar

Apriori algorithm is a classical algorithm of association rule mining and widely used for generating frequent item sets. However, the original Apriori algorithm has some limitation such as it needs to scan the dataset many times to discover all frequent itemset and generate huge number of candidate itemset. To overcome these limitations, researchers have made a lot of improvements to the Apriori such as candidate generation, without candidate generation, transaction reduction, partitioning, and sampling. When it comes to mine massive data, these algorithms failed to prove efficiency because limitation of the processing capacity, storage capacity, and main memory constraints. Therefore, parallel and distributed algorithms are developed to perform large-scale computing in ARM on multiple processors. However, the problems with most of the parallel and distributed framework are overheads of managing distributed system, lack of high level parallel programming language, and node failures. Hadoop-MapReduce is an efficient, scalable, and simplified programming model for massive data processing and it also available on cloud environment. Cloud computing offers huge computing resources, and capacities to solve big data challenges. Recently many parallel algorithms have been proposed on Hadoop-MapReduce to enhance the performance of Apriori algorithm but there are some drawbacks: since multiple scan over the dataset is needed to generate candidate itemset, it consume more execution time. The aim of this study is to propose a parallel Transaction Reduction MapReduce Apriori algorithm (TRMR-Apriori) which is reduce unnecessary transaction values and transactions from the dataset in parallel manner to overcome above problems. The experiments show that TRMR-Apriori is able to achieve better execution time to discover frequent itemset those of previous sequential ARM algorithms such as Apriori, AprioriTid, Eclat, and FP-Growth and the previous parallel algorithms such as PApriori, MRApriori, and Modified Apriori with different condition on homogeneous computing environment using Hadoop-MapReduce platform in cloud. Overall, the TRMR-Apriori shows the strength to extract the frequent itemset from massive dataset in cloud.  


PLoS ONE ◽  
2021 ◽  
Vol 16 (4) ◽  
pp. e0250306
Author(s):  
Jonas Latt ◽  
Christophe Coreixas ◽  
Joël Beny

We present a novel, hardware-agnostic implementation strategy for lattice Boltzmann (LB) simulations, which yields massive performance on homogeneous and heterogeneous many-core platforms. Based solely on C++17 Parallel Algorithms, our approach does not rely on any language extensions, external libraries, vendor-specific code annotations, or pre-compilation steps. Thanks in particular to a recently proposed GPU back-end to C++17 Parallel Algorithms, it is shown that a single code can compile and reach state-of-the-art performance on both many-core CPU and GPU environments for the solution of a given non trivial fluid dynamics problem. The proposed strategy is tested with six different, commonly used implementation schemes to test the performance impact of memory access patterns on different platforms. Nine different LB collision models are included in the tests and exhibit good performance, demonstrating the versatility of our parallel approach. This work shows that it is less than ever necessary to draw a distinction between research and production software, as a concise and generic LB implementation yields performances comparable to those achievable in a hardware specific programming language. The results also highlight the gains of performance achieved by modern many-core CPUs and their apparent capability to narrow the gap with the traditionally massively faster GPU platforms. All code is made available to the community in form of the open-source project stlbm, which serves both as a stand-alone simulation software and as a collection of reusable patterns for the acceleration of pre-existing LB codes.


2013 ◽  
Vol 65 (3) ◽  
pp. 362-379 ◽  
Author(s):  
J. Du ◽  
F. Fang ◽  
C.C. Pain ◽  
I.M. Navon ◽  
J. Zhu ◽  
...  

2000 ◽  
Vol 18 (2) ◽  
pp. 197-205 ◽  
Author(s):  
V.I. VOLKOV ◽  
V.A. GORDEYCHUK ◽  
N.S. ES'KOV ◽  
O.M. KOZYREV

The paper addresses Rayleigh–Taylor instability (RTI) problems and presents a method of describing an interface with markers on a Eulerian mesh, which is implemented in the MAH-3 code. The proposed method allows a more accurate description of the evolution of the interface caused by Rayleigh–Taylor perturbations and preserves symmetry of the interface under appropriate symmetry of the problem (planar, cylindrical, and spherical). The method employs an unstructured triangular mesh of makers. Method capabilities are demonstrated on 2D and 3D Rayleigh–Taylor instability problems.


Author(s):  
Oubay Hassan ◽  
Kenneth Morgan ◽  
Nigel Weatherill

A review of a procedure for the simulation of time-dependent, inviscid and turbulent viscous, compressible flows involving geometries that change in time is presented. The adopted discretization technique employs unstructured meshes and both explicit and implicit time-stepping schemes. A dual time-stepping procedure and an ALE formulation enable flows involving moving boundary components to be included. Techniques that have been developed to maintain the validity of the unstructured mesh and to allow for the capture of moving flow features are also reviewed. Using the in-house developed techniques, some examples are included to demonstrate the use of the approach for the simulation of a number of flows of practical industrial interest.


2013 ◽  
Vol 401-403 ◽  
pp. 1859-1863
Author(s):  
Qing Yang ◽  
Jun Liu ◽  
Huan Wang ◽  
Wen Li Zhou ◽  
Hua Yu

Understanding traffic per unit time in cell dimension in cellular data network can be of great help for mobile operators to improve the performance of the cellular data network. It is important for network design and resource optimization. In this paper, we describe three methods to count the traffic per unit time per cell. Moreover, we compare the results of the three methods by the deviation distribution of the traffic and time complexity analysis. Our work is distinguished from other related work by using big data which contains around 1.4 billion records and 20 thousands cells. Generally, we expect this paper could deliver important insights into cellar data network resource optimization.


Computers ◽  
2018 ◽  
Vol 7 (3) ◽  
pp. 44 ◽  
Author(s):  
Thinh Cao ◽  
Koichi Yamada ◽  
Muneyuki Unehara ◽  
Izumi Suzuki ◽  
Do Nguyen

The paper discusses the use of parallel computation to obtain rough set approximations from large-scale information systems where missing data exist in both condition and decision attributes. To date, many studies have focused on missing condition data, but very few have accounted for missing decision data, especially in enlarging datasets. One of the approaches for dealing with missing data in condition attributes is named twofold rough approximations. The paper aims to extend the approach to deal with missing data in the decision attribute. In addition, computing twofold rough approximations is very intensive, thus the approach is not suitable when input datasets are large. We propose parallel algorithms to compute twofold rough approximations in large-scale datasets. Our method is based on MapReduce, a distributed programming model for processing large-scale data. We introduce the original sequential algorithm first and then the parallel version is introduced. Comparison between the two approaches through experiments shows that our proposed parallel algorithms are suitable for and perform efficiently on large-scale datasets that have missing data in condition and decision attributes.


Sign in / Sign up

Export Citation Format

Share Document