Efficient Parallel Algorithms for 3D Laplacian Smoothing on the GPU

Lei Xiao; Guoxiang Yang; Kunyang Zhao; Gang Mei

doi:10.3390/app9245437

Efficient Parallel Algorithms for 3D Laplacian Smoothing on the GPU

Applied Sciences ◽

10.3390/app9245437 ◽

2019 ◽

Vol 9 (24) ◽

pp. 5437

Author(s):

Lei Xiao ◽

Guoxiang Yang ◽

Kunyang Zhao ◽

Gang Mei

Keyword(s):

Large Scale ◽

Graphics Processing Unit ◽

Parallel Implementation ◽

Three Dimensional ◽

Smoothing Method ◽

Three Dimensions ◽

Processing Unit ◽

Mesh Quality ◽

Mesh Smoothing ◽

Laplacian Smoothing

In numerical modeling, mesh quality is one of the decisive factors that strongly affects the accuracy of calculations and the convergence of iterations. To improve mesh quality, the Laplacian mesh smoothing method, which repositions nodes to the barycenter of adjacent nodes without changing the mesh topology, has been widely used. However, smoothing a large-scale three dimensional mesh is quite computationally expensive, and few studies have focused on accelerating the Laplacian mesh smoothing method by utilizing the graphics processing unit (GPU). This paper presents a GPU-accelerated parallel algorithm for Laplacian smoothing in three dimensions by considering the influence of different data layouts and iteration forms. To evaluate the efficiency of the GPU implementation, the parallel solution is compared with the original serial solution. Experimental results show that our parallel implementation is up to 46 times faster than the serial version.

Download Full-text

Designing Parallel Adaptive Laplacian Smoothing for Improving Tetrahedral Mesh Quality on the GPU

Applied Sciences ◽

10.3390/app11125543 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5543

Author(s):

Ning Xi ◽

Yinjie Sun ◽

Lei Xiao ◽

Gang Mei

Keyword(s):

Parallel Algorithm ◽

Adaptive Algorithm ◽

Large Scale ◽

Tetrahedral Mesh ◽

Processing Unit ◽

Mesh Quality ◽

Mesh Smoothing ◽

Tetrahedral Meshes ◽

Numerical Computing ◽

Laplacian Smoothing

Mesh quality is a critical issue in numerical computing because it directly impacts both computational efficiency and accuracy. Tetrahedral meshes are widely used in various engineering and science applications. However, in large-scale and complicated application scenarios, there are a large number of tetrahedrons, and in this case, the improvement of mesh quality is computationally expensive. Laplacian mesh smoothing is a simple mesh optimization method that improves mesh quality by changing the locations of nodes. In this paper, by exploiting the parallelism features of the modern graphics processing unit (GPU), we specifically designed a parallel adaptive Laplacian smoothing algorithm for improving the quality of large-scale tetrahedral meshes. In the proposed adaptive algorithm, we defined the aspect ratio as a metric to judge the mesh quality after each iteration to ensure that every smoothing improves the mesh quality. The adaptive algorithm avoids the shortcoming of the ordinary Laplacian algorithm to create potential invalid elements in the concave area. We conducted 5 groups of comparative experimental tests to evaluate the performance of the proposed parallel algorithm. The results demonstrated that the proposed adaptive algorithm is up to 23 times faster than the serial algorithms; and the accuracy of the tetrahedral mesh is satisfactorily improved after adaptive Laplacian mesh smoothing. Compared with the ordinary Laplacian algorithm, the proposed adaptive Laplacian algorithm is more applicable, and can effectively deal with those tetrahedrons with extremely poor quality. This indicates that the proposed parallel algorithm can be applied to improve the mesh quality in large-scale and complicated application scenarios.

Download Full-text

Prediction of Residual Stresses in a Multipass Pipe Weld by a Novel 3D Finite Element Approach

Volume 6B: Materials and Fabrication ◽

10.1115/pvp2018-85044 ◽

2018 ◽

Cited By ~ 1

Author(s):

Hui Huang ◽

Jian Chen ◽

Blair Carlson ◽

Hui-Ping Wang ◽

Paul Crooker ◽

...

Keyword(s):

Finite Element ◽

Residual Stresses ◽

High Performance ◽

Large Scale ◽

Graphics Processing Unit ◽

Computational Cost ◽

Three Dimensional ◽

Processing Unit ◽

Girth Welds ◽

Welding Processes

Due to enormous computation cost, current residual stress simulation of multipass girth welds are mostly performed using two-dimensional (2D) axisymmetric models. The 2D model can only provide limited estimation on the residual stresses by assuming its axisymmetric distribution. In this study, a highly efficient thermal-mechanical finite element code for three dimensional (3D) model has been developed based on high performance Graphics Processing Unit (GPU) computers. Our code is further accelerated by considering the unique physics associated with welding processes that are characterized by steep temperature gradient and a moving arc heat source. It is capable of modeling large-scale welding problems that cannot be easily handled by the existing commercial simulation tools. To demonstrate the accuracy and efficiency, our code was compared with a commercial software by simulating a 3D multi-pass girth weld model with over 1 million elements. Our code achieved comparable solution accuracy with respect to the commercial one but with over 100 times saving on computational cost. Moreover, the three-dimensional analysis demonstrated more realistic stress distribution that is not axisymmetric in hoop direction.

Download Full-text

Large-scale sound field rendering with graphics processing unit cluster for three-dimensional audio with loudspeaker array

10.1121/1.4798996 ◽

2013 ◽

Author(s):

Takao Tsuchiya ◽

Yukio Iwaya ◽

Makoto Otani

Keyword(s):

Large Scale ◽

Graphics Processing Unit ◽

Three Dimensional ◽

Sound Field ◽

Processing Unit ◽

Graphics Processing

Download Full-text

GPU-ACCELERATED INTERACTIVE VISUALIZATION OF 3D VOLUMETRIC DATA USING CUDA

International Journal of Image and Graphics ◽

10.1142/s0219467813400032 ◽

2013 ◽

Vol 13 (02) ◽

pp. 1340003 ◽

Cited By ~ 3

Author(s):

PIYUSH KUMAR ◽

ANUPAM AGRAWAL

Keyword(s):

Ct Scan ◽

Volume Rendering ◽

Large Scale ◽

Graphics Processing Unit ◽

Three Dimensional ◽

Processing Unit ◽

Volume Data ◽

Scan Data ◽

Visualization Techniques ◽

The Cost

Improving the image quality and the rendering speed have always been a challenge to the programmers involved in large scale volume rendering especially in the field of medical image processing. The paper aims to perform volume rendering using the graphics processing unit (GPU), in which, with its massively parallel capability has the potential to revolutionize this field. This work is now better with the help of GPU accelerated system. The final results would allow the doctors to diagnose and analyze the 2D computed tomography (CT) scan data using three dimensional visualization techniques. The system is used in multiple types of datasets, from 10 MB to 350 MB medical volume data. Further, the use of compute unified device architecture (CUDA) framework, a low learning curve technology, for such purpose would greatly reduce the cost involved in CT scan analysis; hence bring it to the common masses. The volume rendering has been done on Nvidia Tesla C1060 (there are 240 CUDA cores, which provides execution of data parallely) card and its performance has also been benchmarked.

Download Full-text

Large-scale sound field rendering with graphics processing unit cluster for three-dimensional audio with loudspeaker array

The Journal of the Acoustical Society of America ◽

10.1121/1.4806131 ◽

2013 ◽

Vol 133 (5) ◽

pp. 3454-3454 ◽

Cited By ~ 1

Author(s):

Takao Tsuchiya ◽

Yukio Iwaya ◽

Makoto Otani

Keyword(s):

Large Scale ◽

Graphics Processing Unit ◽

Three Dimensional ◽

Sound Field ◽

Processing Unit ◽

Graphics Processing

Download Full-text

A Parallel-Computing Approach for Vector Road-Network Matching Using GPU Architecture

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7120472 ◽

2018 ◽

Vol 7 (12) ◽

pp. 472 ◽

Cited By ~ 1

Author(s):

Bo Wan ◽

Lin Yang ◽

Shunping Zhou ◽

Run Wang ◽

Dezhi Wang ◽

...

Keyword(s):

Road Network ◽

Large Scale ◽

Graphics Processing Unit ◽

Road Networks ◽

Processing Unit ◽

Data Partition ◽

Matching Method ◽

The Road ◽

Central Processing ◽

Relaxation Matching

The road-network matching method is an effective tool for map integration, fusion, and update. Due to the complexity of road networks in the real world, matching methods often contain a series of complicated processes to identify homonymous roads and deal with their intricate relationship. However, traditional road-network matching algorithms, which are mainly central processing unit (CPU)-based approaches, may have performance bottleneck problems when facing big data. We developed a particle-swarm optimization (PSO)-based parallel road-network matching method on graphics-processing unit (GPU). Based on the characteristics of the two main stages (similarity computation and matching-relationship identification), data-partition and task-partition strategies were utilized, respectively, to fully use GPU threads. Experiments were conducted on datasets with 14 different scales. Results indicate that the parallel PSO-based matching algorithm (PSOM) could correctly identify most matching relationships with an average accuracy of 84.44%, which was at the same level as the accuracy of a benchmark—the probability-relaxation-matching (PRM) method. The PSOM approach significantly reduced the road-network matching time in dealing with large amounts of data in comparison with the PRM method. This paper provides a common parallel algorithm framework for road-network matching algorithms and contributes to integration and update of large-scale road-networks.

Download Full-text

Realtime cerebellum: A large-scale spiking network model of the cerebellum that runs in realtime using a graphics processing unit

Neural Networks ◽

10.1016/j.neunet.2013.01.019 ◽

2013 ◽

Vol 47 ◽

pp. 103-111 ◽

Cited By ~ 47

Author(s):

Tadashi Yamazaki ◽

Jun Igarashi

Keyword(s):

Network Model ◽

Large Scale ◽

Graphics Processing Unit ◽

Processing Unit ◽

Spiking Network ◽

Graphics Processing

Download Full-text

Synthetic reflection seismograms in three dimensions by a locked‐mode approximation

Geophysics ◽

10.1190/1.1442660 ◽

1989 ◽

Vol 54 (3) ◽

pp. 350-358 ◽

Cited By ~ 24

Author(s):

G. Nolet ◽

R. Sleeman ◽

V. Nijhof ◽

B. L. N. Kennett

Keyword(s):

Finite Difference ◽

Born Approximation ◽

Large Scale ◽

Layered Medium ◽

Three Dimensional ◽

Simple Algorithm ◽

Realistic Model ◽

Three Dimensions ◽

Acoustic Response ◽

Many Sources

We present a simple algorithm for computing the acoustic response of a layered structure containing three‐dimensional (3-D) irregularities, using a locked‐mode approach and the Born approximation. The effects of anelasticity are incorporated by use of Rayleigh’s principle. The method is particularly attractive at somewhat larger offsets, but computations for near‐source offsets are stable as well, due to the introduction of anelastic damping. Calculations can be done on small minicomputers. The algorithm developed in this paper can be used to calculate the response of complicated models in three dimensions. It is more efficient than any other method whenever many sources are involved. The results are useful for modeling, as well as for generating test signals for data processing with realistic, model‐induced “noise.” Also, this approach provides an alternative to 2-D finite‐difference calculations that is efficient enough for application to large‐scale inverse problems. The method is illustrated by application to a simple 3-D structure in a layered medium.

Download Full-text

A lightweight approach to performance portability with targetDP

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016682071 ◽

2016 ◽

Vol 32 (2) ◽

pp. 288-301

Author(s):

Alan Gray ◽

Kevin Stratford

Keyword(s):

Particle Physics ◽

Message Passing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Message Passing Interface ◽

Graphics Processing Unit ◽

Processing Unit ◽

Performance Portability ◽

Graphics Processing

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.

Download Full-text

A Real-Time Photogrammetric System for Acquisition and Monitoring of Three-Dimensional Human Body Kinematics

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.5.363 ◽

2021 ◽

Vol 87 (5) ◽

pp. 363-373

Author(s):

Long Chen ◽

Bo Wu ◽

Yao Zhao ◽

Yuan Li

Keyword(s):

Real Time ◽

Human Body ◽

Graphics Processing Unit ◽

Three Dimensional ◽

Stereo Pair ◽

Processing Unit ◽

Detection Distance ◽

Human Kinematics ◽

Graphics Processing ◽

Time Acquisition

Real-time acquisition and analysis of three-dimensional (3D) human body kinematics are essential in many applications. In this paper, we present a real-time photogrammetric system consisting of a stereo pair of red-green-blue (RGB) cameras. The system incorporates a multi-threaded and graphics processing unit (GPU)-accelerated solution for real-time extraction of 3D human kinematics. A deep learning approach is adopted to automatically extract two-dimensional (2D) human body features, which are then converted to 3D features based on photogrammetric processing, including dense image matching and triangulation. The multi-threading scheme and GPU-acceleration enable real-time acquisition and monitoring of 3D human body kinematics. Experimental analysis verified that the system processing rate reached ∼18 frames per second. The effective detection distance reached 15 m, with a geometric accuracy of better than 1% of the distance within a range of 12 m. The real-time measurement accuracy for human body kinematics ranged from 0.8% to 7.5%. The results suggest that the proposed system is capable of real-time acquisition and monitoring of 3D human kinematics with favorable performance, showing great potential for various applications.

Download Full-text