GPU-Based Parallel Simulation of Silicon Anisotropic Etching

Author(s):  
Jianhua Li ◽  
Yan Wang ◽  
Jingyuan Chen ◽  
Li Yan

Silicon anisotropic etching simulation, based on geometric model or cellular automata (CA) model, is highly time-consuming. In this paper, we propose two parallelization methods for the simulation of the silicon anisotropic etching process with CA models on graphics processing units (GPUs). One is the direct parallelization of the serial CA algorithm, and the other is to use a spatial parallelization strategy where each crystal unit cell is allocated to a thread in GPU. The proposed simulation methods are implemented with the Compute Unified Device Architecture (CUDA) application programming interface. Several computational experiments are taken to analyze the efficiency of the methods.

Author(s):  
Jianhua Li ◽  
Jingyuan Chen ◽  
Yan Wang ◽  
Jianhua Huang

The parallelization of silicon anisotropic etching simulation with the cellular automata (CA) model on graphics processing units (GPUs) is challenging, because the numbers of computational tasks in etching simulation dynamically change and the existing parallel CA mechanisms do not fit in GPU computation well. In this paper, an improved CA model, called clustered cell model, is proposed for GPU-based etching simulation. The model consists of clustered cells, each of which manages a scalable number of atoms. In this model, only the etching and update of states for the atoms on the etching surface and their unexposed neighbors are performed at each CA time step, whereas the clustered cells are reclassified in a longer time step. With this model, a crystal cell parallelization method is given, where clustered cells are allocated to threads on GPUs in the simulation. With the optimizations from the spatial and temporal aspects as well as a proper granularity, this method provides a faster process simulation. The proposed simulation method is implemented with the Compute Unified Device Architecture (CUDA) application programming interface. Several computational experiments are taken to analyze the efficiency of the method.


2021 ◽  
Author(s):  
Daiki Ishii ◽  
Masatomo Inui ◽  
Nobuyuki Umezu

Abstract By using the cutter location (CL) surface, fast and stable computation of the cutter path for machining complicated molds and dies can be realized. State-of-the-art graphics processing units (GPUs) are equipped with special hardware named ray tracing (RT) cores dedicated to image processing (called ray tracing) for 3D computer graphics. Using RT cores, it is possible to quickly compute the intersection points between a set of straight lines and polygons. In this paper, we propose a novel CL surface computation method using the RT core. The RT core was originally designed to accelerate 3D computer graphics processing. For the development of software using RT cores, it is necessary to use the OptiX application programming interface (API) library for computer graphics. We demonstrate how to use the OptiX API in the development of software for CL surface computations. Computational experiments were carried out, and it was confirmed that it is possible to obtain the CL surface based on a very high-resolution Z-map several times faster than the depth buffer-based method, which has been considered to be the fastest to date.


2015 ◽  
Vol 8 (9) ◽  
pp. 2815-2827 ◽  
Author(s):  
S. Xu ◽  
X. Huang ◽  
L.-Y. Oey ◽  
F. Xu ◽  
H. Fu ◽  
...  

Abstract. Graphics processing units (GPUs) are an attractive solution in many scientific applications due to their high performance. However, most existing GPU conversions of climate models use GPUs for only a few computationally intensive regions. In the present study, we redesign the mpiPOM (a parallel version of the Princeton Ocean Model) with GPUs. Specifically, we first convert the model from its original Fortran form to a new Compute Unified Device Architecture C (CUDA-C) code, then we optimize the code on each of the GPUs, the communications between the GPUs, and the I / O between the GPUs and the central processing units (CPUs). We show that the performance of the new model on a workstation containing four GPUs is comparable to that on a powerful cluster with 408 standard CPU cores, and it reduces the energy consumption by a factor of 6.8.


2009 ◽  
Vol 79-82 ◽  
pp. 1309-1312
Author(s):  
Kuan Yu ◽  
Bo Zhu

Molecular simulation can provide mechanism insights into how material behaviour related to molecular properties and microscopic details of the arrangement of many molecules. With the development of Graphics Processing Unit (GPU), scientists have realized general purpose molecular simulations on GPU and the Common Unified Device Architecture (CUDA) environment. In this paper, we provided a brief overview of molecular simulation and CUDA, introduced the recent achievements in molecular simulation based on GPU in material science, mainly about Monte Carlo method and Molecular Dynamics. The recent research achievements have shown that GPUs can provide unprecedented computational power for scientific applications. With optimized algorithms and program codes, a single GPU can provide a performance equivalent to that of a distributed computer cluster. So, study of molecular simulations based on GPU will accelerate the development of material science in the future.


2014 ◽  
Vol 11 (04) ◽  
pp. 1350063 ◽  
Author(s):  
IFTIKHAR AHMED ◽  
RICK SIOW MONG GOH ◽  
ENG HUAT KHOO ◽  
KIM HUAT LEE ◽  
SIAW KIAN ZHONG ◽  
...  

The Lorentz–Drude model incorporated Maxwell equations are simulated by using the three-dimensional finite difference time domain (FDTD) method and the method is parallelized on multiple graphics processing units (GPUs) for plasmonics applications. The compute unified device architecture (CUDA) is used for GPU parallelization. The Lorentz–Drude (LD) model is used to simulate the dispersive nature of materials in plasmonics domain and the auxiliary differential equation (ADE) approach is used to make it consistent with time domain Maxwell equations. Different aspects of multiple GPUs for the FDTD method are presented such as comparison of different numbers of GPUs, transfer time in between them, synchronous, and asynchronous passing. It is shown that by using multiple GPUs in parallel fashion, significant reduction in the simulation time can be achieved as compared to the single GPU.


2011 ◽  
Vol 19 (4) ◽  
pp. 185-197 ◽  
Author(s):  
Marek Blazewicz ◽  
Steven R. Brandt ◽  
Michal Kierzynka ◽  
Krzysztof Kurowski ◽  
Bogdan Ludwiczak ◽  
...  

With the recent advent of new heterogeneous computing architectures there is still a lack of parallel problem solving environments that can help scientists to use easily and efficiently hybrid supercomputers. Many scientific simulations that use structured grids to solve partial differential equations in fact rely on stencil computations. Stencil computations have become crucial in solving many challenging problems in various domains, e.g., engineering or physics. Although many parallel stencil computing approaches have been proposed, in most cases they solve only particular problems. As a result, scientists are struggling when it comes to the subject of implementing a new stencil-based simulation, especially on high performance hybrid supercomputers. In response to the presented need we extend our previous work on a parallel programming framework for CUDA – CaCUDA that now supports OpenCL. We present CaKernel – a tool that simplifies the development of parallel scientific applications on hybrid systems. CaKernel is built on the highly scalable and portable Cactus framework. In the CaKernel framework, Cactus manages the inter-process communication via MPI while CaKernel manages the code running on Graphics Processing Units (GPUs) and interactions between them. As a non-trivial test case we have developed a 3D CFD code to demonstrate the performance and scalability of the automatically generated code.


2021 ◽  
Vol 4 ◽  
pp. 16-22
Author(s):  
Mykola Semylitko ◽  
Gennadii Malaschonok

SVD (Singular Value Decomposition) algorithm is used in recommendation systems, machine learning, image processing, and in various algorithms for working with matrices which can be very large and Big Data, so, given the peculiarities of this algorithm, it can be performed on a large number of computing threads that have only video cards.CUDA is a parallel computing platform and application programming interface model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit for general purpose processing – an approach termed GPGPU (general-purpose computing on graphics processing units). The GPU provides much higher instruction throughput and memory bandwidth than the CPU within a similar price and power envelope. Many applications leverage these higher capabilities to run faster on the GPU than on the CPU. Other computing devices, like FPGAs, are also very energy efficient, but they offer much less programming flexibility than GPUs.The developed modification uses the CUDA architecture, which is intended for a large number of simultaneous calculations, which allows to quickly process matrices of very large sizes. The algorithm of parallel SVD for a three-diagonal matrix based on the Givents rotation provides a high accuracy of calculations. Also the algorithm has a number of optimizations to work with memory and multiplication algorithms that can significantly reduce the computation time discarding empty iterations.This article proposes an approach that will reduce the computation time and, consequently, resources and costs. The developed algorithm can be used with the help of a simple and convenient API in C ++ and Java, as well as will be improved by using dynamic parallelism or parallelization of multiplication operations. Also the obtained results can be used by other developers for comparison, as all conditions of the research are described in detail, and the code is in free access.


Sign in / Sign up

Export Citation Format

Share Document