GPU-Based Parallel Simulation of Silicon Anisotropic Etching

Volume 2: 32nd Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2012-71267 ◽

2012 ◽

Author(s):

Jianhua Li ◽

Yan Wang ◽

Jingyuan Chen ◽

Li Yan

Keyword(s):

Graphics Processing Units ◽

Geometric Model ◽

Application Programming Interface ◽

Anisotropic Etching ◽

Device Architecture ◽

Crystal Unit Cell ◽

Simulation Based ◽

Ca Model ◽

Application Programming ◽

Graphics Processing

Silicon anisotropic etching simulation, based on geometric model or cellular automata (CA) model, is highly time-consuming. In this paper, we propose two parallelization methods for the simulation of the silicon anisotropic etching process with CA models on graphics processing units (GPUs). One is the direct parallelization of the serial CA algorithm, and the other is to use a spatial parallelization strategy where each crystal unit cell is allocated to a thread in GPU. The proposed simulation methods are implemented with the Compute Unified Device Architecture (CUDA) application programming interface. Several computational experiments are taken to analyze the efficiency of the methods.

Download Full-text

Clustered Cell Parallelization for GPU Computing of Silicon Anisotropic Etching Simulation

Volume 2A: 33rd Computers and Information in Engineering Conference ◽

10.1115/detc2013-12965 ◽

2013 ◽

Author(s):

Jianhua Li ◽

Jingyuan Chen ◽

Yan Wang ◽

Jianhua Huang

Keyword(s):

Graphics Processing Units ◽

Gpu Computing ◽

Cell Model ◽

Application Programming Interface ◽

Simulation Method ◽

Anisotropic Etching ◽

Time Step ◽

Ca Model ◽

Application Programming ◽

Graphics Processing

The parallelization of silicon anisotropic etching simulation with the cellular automata (CA) model on graphics processing units (GPUs) is challenging, because the numbers of computational tasks in etching simulation dynamically change and the existing parallel CA mechanisms do not fit in GPU computation well. In this paper, an improved CA model, called clustered cell model, is proposed for GPU-based etching simulation. The model consists of clustered cells, each of which manages a scalable number of atoms. In this model, only the etching and update of states for the atoms on the etching surface and their unexposed neighbors are performed at each CA time step, whereas the clustered cells are reclassified in a longer time step. With this model, a crystal cell parallelization method is given, where clustered cells are allocated to threads on GPUs in the simulation. With the optimizations from the spatial and temporal aspects as well as a proper granularity, this method provides a faster process simulation. The proposed simulation method is implemented with the Compute Unified Device Architecture (CUDA) application programming interface. Several computational experiments are taken to analyze the efficiency of the method.

Download Full-text

Fast Cutter Location Surface Computation Using Ray Tracing Cores

10.1115/detc2021-68081 ◽

2021 ◽

Author(s):

Daiki Ishii ◽

Masatomo Inui ◽

Nobuyuki Umezu

Keyword(s):

Computer Graphics ◽

Ray Tracing ◽

Graphics Processing Units ◽

Application Programming Interface ◽

Computation Method ◽

Cutter Path ◽

Depth Buffer ◽

Stable Computation ◽

Graphics Processing ◽

Straight Lines

Abstract By using the cutter location (CL) surface, fast and stable computation of the cutter path for machining complicated molds and dies can be realized. State-of-the-art graphics processing units (GPUs) are equipped with special hardware named ray tracing (RT) cores dedicated to image processing (called ray tracing) for 3D computer graphics. Using RT cores, it is possible to quickly compute the intersection points between a set of straight lines and polygons. In this paper, we propose a novel CL surface computation method using the RT core. The RT core was originally designed to accelerate 3D computer graphics processing. For the development of software using RT cores, it is necessary to use the OptiX application programming interface (API) library for computer graphics. We demonstrate how to use the OptiX API in the development of software for CL surface computations. Computational experiments were carried out, and it was confirmed that it is possible to obtain the CL surface based on a very high-resolution Z-map several times faster than the depth buffer-based method, which has been considered to be the fastest to date.

Download Full-text

POM.gpu-v1.0: a GPU-based Princeton Ocean Model

Geoscientific Model Development ◽

10.5194/gmd-8-2815-2015 ◽

2015 ◽

Vol 8 (9) ◽

pp. 2815-2827 ◽

Cited By ~ 13

Author(s):

S. Xu ◽

X. Huang ◽

L.-Y. Oey ◽

F. Xu ◽

H. Fu ◽

...

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Climate Models ◽

Ocean Model ◽

Compute Unified Device Architecture ◽

Princeton Ocean Model ◽

Central Processing ◽

Device Architecture ◽

Computationally Intensive ◽

Graphics Processing

Abstract. Graphics processing units (GPUs) are an attractive solution in many scientific applications due to their high performance. However, most existing GPU conversions of climate models use GPUs for only a few computationally intensive regions. In the present study, we redesign the mpiPOM (a parallel version of the Princeton Ocean Model) with GPUs. Specifically, we first convert the model from its original Fortran form to a new Compute Unified Device Architecture C (CUDA-C) code, then we optimize the code on each of the GPUs, the communications between the GPUs, and the I / O between the GPUs and the central processing units (CPUs). We show that the performance of the new model on a workstation containing four GPUs is comparable to that on a powerful cluster with 408 standard CPU cores, and it reduces the energy consumption by a factor of 6.8.

Download Full-text

Voice Command Recognition with Dynamic Time Warping (DTW) using Graphics Processing Units (GPU) with Compute Unified Device Architecture (CUDA)

19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07) ◽

10.1109/sbac-pad.2007.21 ◽

2007 ◽

Cited By ~ 11

Author(s):

Gustavo Poli ◽

Joao F. Mari ◽

Jose Hiroki Saito ◽

Alexandre L. M. Levada

Keyword(s):

Graphics Processing Units ◽

Dynamic Time Warping ◽

Compute Unified Device Architecture ◽

Time Warping ◽

Voice Command ◽

Device Architecture ◽

Dynamic Time ◽

Graphics Processing

Download Full-text

Recent Development of Molecular Simulation Based on GPU in Material Science

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.79-82.1309 ◽

2009 ◽

Vol 79-82 ◽

pp. 1309-1312

Author(s):

Kuan Yu ◽

Bo Zhu

Keyword(s):

Molecular Simulation ◽

Material Science ◽

Graphics Processing Unit ◽

Molecular Simulations ◽

General Purpose ◽

Processing Unit ◽

Device Architecture ◽

Simulation Based ◽

The Common ◽

Graphics Processing

Molecular simulation can provide mechanism insights into how material behaviour related to molecular properties and microscopic details of the arrangement of many molecules. With the development of Graphics Processing Unit (GPU), scientists have realized general purpose molecular simulations on GPU and the Common Unified Device Architecture (CUDA) environment. In this paper, we provided a brief overview of molecular simulation and CUDA, introduced the recent achievements in molecular simulation based on GPU in material science, mainly about Monte Carlo method and Molecular Dynamics. The recent research achievements have shown that GPUs can provide unprecedented computational power for scientific applications. With optimized algorithms and program codes, a single GPU can provide a performance equivalent to that of a distributed computer cluster. So, study of molecular simulations based on GPU will accelerate the development of material science in the future.

Download Full-text

IMPLEMENTATION OF THE LORENTZ–DRUDE MODEL INCORPORATED FDTD METHOD ON MULTIPLE GPUs FOR PLASMONICS APPLICATIONS

International Journal of Computational Methods ◽

10.1142/s0219876213500631 ◽

2014 ◽

Vol 11 (04) ◽

pp. 1350063 ◽

Cited By ~ 2

Author(s):

IFTIKHAR AHMED ◽

RICK SIOW MONG GOH ◽

ENG HUAT KHOO ◽

KIM HUAT LEE ◽

SIAW KIAN ZHONG ◽

...

Keyword(s):

Time Domain ◽

Graphics Processing Units ◽

Maxwell Equations ◽

Fdtd Method ◽

Three Dimensional ◽

Drude Model ◽

Multiple Gpus ◽

Device Architecture ◽

Graphics Processing ◽

Difference Time

The Lorentz–Drude model incorporated Maxwell equations are simulated by using the three-dimensional finite difference time domain (FDTD) method and the method is parallelized on multiple graphics processing units (GPUs) for plasmonics applications. The compute unified device architecture (CUDA) is used for GPU parallelization. The Lorentz–Drude (LD) model is used to simulate the dispersive nature of materials in plasmonics domain and the auxiliary differential equation (ADE) approach is used to make it consistent with time domain Maxwell equations. Different aspects of multiple GPUs for the FDTD method are presented such as comparison of different numbers of GPUs, transfer time in between them, synchronous, and asynchronous passing. It is shown that by using multiple GPUs in parallel fashion, significant reduction in the simulation time can be achieved as compared to the single GPU.

Download Full-text

CaKernel – A Parallel Application Programming Framework for Heterogenous Computing Architectures

Scientific Programming ◽

10.1155/2011/457030 ◽

2011 ◽

Vol 19 (4) ◽

pp. 185-197 ◽

Cited By ~ 7

Author(s):

Marek Blazewicz ◽

Steven R. Brandt ◽

Michal Kierzynka ◽

Krzysztof Kurowski ◽

Bogdan Ludwiczak ◽

...

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Heterogeneous Computing ◽

Test Case ◽

Stencil Computations ◽

Programming Framework ◽

Problem Solving Environments ◽

Scientific Simulations ◽

Application Programming ◽

Graphics Processing

With the recent advent of new heterogeneous computing architectures there is still a lack of parallel problem solving environments that can help scientists to use easily and efficiently hybrid supercomputers. Many scientific simulations that use structured grids to solve partial differential equations in fact rely on stencil computations. Stencil computations have become crucial in solving many challenging problems in various domains, e.g., engineering or physics. Although many parallel stencil computing approaches have been proposed, in most cases they solve only particular problems. As a result, scientists are struggling when it comes to the subject of implementing a new stencil-based simulation, especially on high performance hybrid supercomputers. In response to the presented need we extend our previous work on a parallel programming framework for CUDA – CaCUDA that now supports OpenCL. We present CaKernel – a tool that simplifies the development of parallel scientific applications on hybrid systems. CaKernel is built on the highly scalable and portable Cactus framework. In the CaKernel framework, Cactus manages the inter-process communication via MPI while CaKernel manages the code running on Graphics Processing Units (GPUs) and interactions between them. As a non-trivial test case we have developed a 3D CFD code to demonstrate the performance and scalability of the automatically generated code.

Download Full-text

Low-coherence interferometry with polynomial interpolation on Compute Unified Device Architecture-enabled graphics processing units

Optical Engineering ◽

10.1117/1.oe.52.9.094105 ◽

2013 ◽

Vol 52 (9) ◽

pp. 094105 ◽

Cited By ~ 8

Author(s):

Slawomir Tomczewski ◽

Anna Pakula ◽

Jürgen Van Erps ◽

Hugo Thienpont ◽

Leszek Salbut

Keyword(s):

Graphics Processing Units ◽

Polynomial Interpolation ◽

Compute Unified Device Architecture ◽

Low Coherence ◽

Device Architecture ◽

Low Coherence Interferometry ◽

Graphics Processing

Download Full-text

Parallel SVD Algorithm for a Three-Diagonal Matrix on a Video Card Using the Nvidia CUDA Architecture

NaUKMA Research Papers Computer Science ◽

10.18523/2617-3808.2021.4.16-22 ◽

2021 ◽

Vol 4 ◽

pp. 16-22

Author(s):

Mykola Semylitko ◽

Gennadii Malaschonok

Keyword(s):

Graphics Processing Units ◽

Graphics Processing Unit ◽

Computation Time ◽

Application Programming Interface ◽

General Purpose ◽

Diagonal Matrix ◽

Free Access ◽

Processing Unit ◽

Cuda Architecture ◽

Graphics Processing

SVD (Singular Value Decomposition) algorithm is used in recommendation systems, machine learning, image processing, and in various algorithms for working with matrices which can be very large and Big Data, so, given the peculiarities of this algorithm, it can be performed on a large number of computing threads that have only video cards.CUDA is a parallel computing platform and application programming interface model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit for general purpose processing – an approach termed GPGPU (general-purpose computing on graphics processing units). The GPU provides much higher instruction throughput and memory bandwidth than the CPU within a similar price and power envelope. Many applications leverage these higher capabilities to run faster on the GPU than on the CPU. Other computing devices, like FPGAs, are also very energy efficient, but they offer much less programming flexibility than GPUs.The developed modification uses the CUDA architecture, which is intended for a large number of simultaneous calculations, which allows to quickly process matrices of very large sizes. The algorithm of parallel SVD for a three-diagonal matrix based on the Givents rotation provides a high accuracy of calculations. Also the algorithm has a number of optimizations to work with memory and multiplication algorithms that can significantly reduce the computation time discarding empty iterations.This article proposes an approach that will reduce the computation time and, consequently, resources and costs. The developed algorithm can be used with the help of a simple and convenient API in C ++ and Java, as well as will be improved by using dynamic parallelism or parallelization of multiplication operations. Also the obtained results can be used by other developers for comparison, as all conditions of the research are described in detail, and the code is in free access.

Download Full-text

Voice Command Recognition with Dynamic Time Warping (DTW) using Graphics Processing Units (GPU) with Compute Unified Device Architecture (CUDA)

19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07) ◽

10.1109/sbac-pad.2007.43 ◽

2007 ◽

Author(s):

Gustavo Poli ◽

Joao F. Mari ◽

Jose Hiroki Saito ◽

Alexandre L. M. Levada

Keyword(s):

Graphics Processing Units ◽

Dynamic Time Warping ◽

Compute Unified Device Architecture ◽

Time Warping ◽

Voice Command ◽

Device Architecture ◽

Dynamic Time ◽

Graphics Processing

Download Full-text