GPU-Accelerated Laplace Equation Model Development Based on CUDA Fortran

Boram Kim; Kwang Seok Yoon; Hyung-Jun Kim

doi:10.3390/w13233435

GPU-Accelerated Laplace Equation Model Development Based on CUDA Fortran

Water ◽

10.3390/w13233435 ◽

2021 ◽

Vol 13 (23) ◽

pp. 3435

Author(s):

Boram Kim ◽

Kwang Seok Yoon ◽

Hyung-Jun Kim

Keyword(s):

Analytical Solution ◽

Laplace Equation ◽

Large Scale ◽

Numerical Models ◽

Model Development ◽

Computation Time ◽

High Accuracy ◽

Equation Model ◽

Computational Time ◽

Cuda Technology

In this study, a CUDA Fortran-based GPU-accelerated Laplace equation model was developed and applied to several cases. The Laplace equation is one of the equations that can physically analyze the groundwater flows, and is an equation that can provide analytical solutions. Such a numerical model requires a large amount of data to physically regenerate the flow with high accuracy, and requires computational time. These numerical models require a large amount of data to physically reproduce the flow with high accuracy and require computational time. As a way to shorten the computation time by applying CUDA technology, large-scale parallel computations were performed on the GPU, and a program was written to reduce the number of data transfers between the CPU and GPU. A GPU consists of many ALUs specialized in graphic processing, and can perform more concurrent computations than a CPU using multiple ALUs. The computation results of the GPU-accelerated model were compared with the analytical solution of the Laplace equation to verify the accuracy. The computation results of the GPU-accelerated Laplace equation model were in good agreement with the analytical solution. As the number of grids increased, the computational time of the GPU-accelerated model gradually reduced compared to the computational time of the CPU-based Laplace equation model. As a result, the computational time of the GPU-accelerated Laplace equation model was reduced by up to about 50 times.

Download Full-text

Accelerating Climate Model Computation by Neural Networks: A Comparative Study

10.5194/egusphere-egu21-10507 ◽

2021 ◽

Author(s):

Maha Mdini ◽

Takemasa Miyoshi ◽

Shigenori Otsuka

Keyword(s):

Neural Networks ◽

Computational Complexity ◽

Statistical Model ◽

Physical Model ◽

Climate Model ◽

Numerical Models ◽

Computation Time ◽

High Accuracy ◽

Physical Models ◽

Computer Resources

<p>In the era of modern science, scientists have developed numerical models to predict and understand the weather and ocean phenomena based on fluid dynamics. While these models have shown high accuracy at kilometer scales, they are operated with massive computer resources because of their computational complexity.&#160; In recent years, new approaches to solve these models based on machine learning have been put forward. The results suggested that it be possible to reduce the computational complexity by Neural Networks (NNs) instead of classical numerical simulations. In this project, we aim to shed light upon di&#64256;erent ways to accelerating physical models using NNs. We test two approaches: Data-Driven Statistical Model (DDSM) and Hybrid Physical-Statistical Model (HPSM) and compare their performance to the classical Process-Driven Physical Model (PDPM). DDSM emulates the physical model by a NN. The HPSM, also known as super-resolution, uses a low-resolution version of the physical model and maps its outputs to the original high-resolution domain via a NN. To evaluate these two methods, we measured their accuracy and their computation time. Our results of idealized experiments with a quasi-geostrophic model [SO3] show that HPSM reduces the computation time by a factor of 3 and it is capable to predict the output of the physical model at high accuracy up to 9.25 days. The DDSM, however, reduces the computation time by a factor of 4 and can predict the physical model output with an acceptable accuracy only within 2 days. These &#64257;rst results are promising and imply the possibility of bringing complex physical models into real time systems with lower-cost computer resources in the future.</p>

Download Full-text

Towards a More General Understanding of the Algorithmic Utility of Recurrent Connections

10.1101/2021.03.12.435130 ◽

2021 ◽

Author(s):

Brett W. Larsen ◽

Shaul Druckmann

Keyword(s):

Decision Making ◽

Large Scale ◽

Neural Circuits ◽

Computation Time ◽

Computational Time ◽

Feedforward Networks ◽

Population Activity ◽

Artificial Neural ◽

The Difference ◽

Recurrent Connections

AbstractLateral and recurrent connections are ubiquitous in biological neural circuits. The strong computational abilities of feedforward networks have been extensively studied; on the other hand, while certain roles for lateral and recurrent connections in specific computations have been described, a more complete understanding of the role and advantages of recurrent computations that might explain their prevalence remains an important open challenge. Previous key studies by Minsky and later by Roelfsema argued that the sequential, parallel computations for which recurrent networks are well suited can be highly effective approaches to complex computational problems. Such “tag propagation” algorithms perform repeated, local propagation of information and were introduced in the context of detecting connectedness, a task that is challenging for feedforward networks. Here, we advance the understanding of the utility of lateral and recurrent computation by first performing a large-scale empirical study of neural architectures for the computation of connectedness to explore feedforward solutions more fully and establish robustly the importance of recurrent architectures. In addition, we highlight a tradeoff between computation time and performance and demonstrate hybrid feedforward/recurrent models that perform well even in the presence of varying computational time limitations. We then generalize tag propagation architectures to multiple, interacting propagating tags and demonstrate that these are efficient computational substrates for more general computations by introducing and solving an abstracted biologically inspired decision-making task. More generally, our work clarifies and expands the set of computational tasks that can be solved efficiently by recurrent computation, yielding hypotheses for structure in population activity that may be present in such tasks.Author SummaryLateral and recurrent connections are ubiquitous in biological neural circuits; intriguingly, this stands in contrast to the majority of current-day artificial neural network research which primarily uses feedforward architectures except in the context of temporal sequences. This raises the possibility that part of the difference in computational capabilities between real neural circuits and artificial neural networks is accounted for by the role of recurrent connections, and as a result a more detailed understanding of the computational role played by such connections is of great importance. Making effective comparisons between architectures is a subtle challenge, however, and in this paper we leverage the computational capabilities of large-scale machine learning to robustly explore how differences in architectures affect a network’s ability to learn a task. We first focus on the task of determining whether two pixels are connected in an image which has an elegant and efficient recurrent solution: propagate a connected label or tag along paths. Inspired by this solution, we show that it can be generalized in many ways, including propagating multiple tags at once and changing the computation performed on the result of the propagation. To illustrate these generalizations, we introduce an abstracted decision-making task related to foraging in which an animal must determine whether it can avoid predators in a random environment. Our results shed light on the set of computational tasks that can be solved efficiently by recurrent computation and how these solutions may appear in neural activity.

Download Full-text

Unsteady Buoyant Jet Simulations Using Dynamic Connection Scheme of Hydrostatic and Non-Hydrostatic Zone

29th International Conference on Ocean, Offshore and Arctic Engineering: Volume 3 ◽

10.1115/omae2010-20666 ◽

2010 ◽

Author(s):

Masanobu Hasebe ◽

Shigeru Tabeta

Keyword(s):

Large Scale ◽

Dynamic Pressure ◽

Vertical Direction ◽

Computation Time ◽

Hydrodynamic Pressure ◽

Momentum Equation ◽

Ocean Model ◽

Computational Time ◽

Buoyant Jet ◽

Hydrostatic Approximation

Most of ocean models employ hydrostatic approximation because the horizontal scale is usually much larger than the vertical scale in oceanic phenomena. In hydrostatic approximation, dynamic pressure is neglected and the momentum equation in vertical direction needs not to be solved. But for the phenomena of buoyant jet from the sea bottom such as submarine groundwater discharge, hydrothermal plume and so on, hydrodynamic pressure cannot be neglected and the momentum equation of vertical direction must to be taken into account. Non-hydrostatic analysis requires so much computation time that it is usually difficult to calculate the current field in the wide ocean area by this approach. On the other hand, analysis assuming the hydrostatic approximation needs less computational time and usually gives reasonable results for large scale ocean phenomena such as tidal current. In the present study, the authors developed a new type of ocean model for multi-scale analysis, which conducts hydrostatic analysis for phenomena in wide area and non-hydrostatic analysis for the detail flow around the buoyant jet simultaneously. The application limit of hydrostatic approximation for ocean model was investigated, and a dynamic connection method of hydrostatic zone with non-hydrostatic zone was developed. By theoretical consideration employing parameter δ and ε which represent the ratio of grid size Δz to Δx and the ratio of vertical velocity to horizontal velocity, it was found that hydrostatic approximation can be applied if δε and ε2 are minute. To examine the developed method, simulations for lock-exchange problem and vertical jet under oscillating current were conducted. The result by the present model was similar to that of non-hydrostatic model in the case that hydrostatic approximation was applied on the area of δε<0.005 and ε2<0.005.

Download Full-text

Computational Performance of an Embedded Reinforcement Mesh Generation Method for Large-Scale RC Simulations

International Journal of Computational Methods ◽

10.1142/s021987621550019x ◽

2015 ◽

Vol 12 (03) ◽

pp. 1550019 ◽

Cited By ~ 3

Author(s):

George Markou

Keyword(s):

Mesh Generation ◽

Large Scale ◽

Numerical Models ◽

Research Work ◽

Full Scale ◽

Computational Time ◽

The Third ◽

Macro Elements ◽

Computational Performance ◽

Embedded Mesh

In this paper, a numerical investigation on the limits of an automatic procedure for the generation of embedded steel reinforcement inside hexahedral finite elements (FEs) is presented. In 3D detailed reinforced concrete simulations, mapping the reinforcement grid inside the concrete hexahedral FEs is performed using the end-point coordinates of the rebar reinforcement macro-elements. This procedure is computationally demanding while in cases of large-scale models, the required computational time for the reinforcement mesh generation is excessive. This research work scopes to study and present the limitations of the embedded mesh generation method that was proposed by Markou and Papadrakakis, through the use of a 64-bit operating system. The embedded mesh generation method is integrated with a filtering algorithm in order to allocate and discard relatively short embedded rebar elements that result from the arbitrary positioning of the embedded rebar macro-elements and the nonprismatic geometry of the hexahedral mesh. The computational robustness and efficiency of the integrated embedded mesh generation method are demonstrated through the analysis of three numerical models. The first two numerical models are a full-scale 2-story and a 7-story RC structures while the third model deals with a full-scale RC bridge with a trapezoidal section and a total span of 100 m. Through the third numerical implementation, the computational capacity of the integrated embedded rebar mesh generation method is investigated.

Download Full-text

An Efficient Implementation of Polymer Viscoelastic Behavior Through a Pseudo Viscoelastic Model

Journal of Microelectronics and Electronic Packaging ◽

10.4071/imaps.285 ◽

2011 ◽

Vol 8 (2) ◽

pp. 83-87

Author(s):

Sathyanarayanan Raghavan ◽

Raphael. I. Okereke ◽

Suresh K. Sitaraman

Keyword(s):

Numerical Models ◽

Viscoelastic Model ◽

Viscoelastic Behavior ◽

Computation Time ◽

Polymer Materials ◽

Computational Time ◽

Viscoelastic Relaxation ◽

Time Saving ◽

Hard Disk Space ◽

Viscoelastic Modeling

Modeling of viscoelastic relaxation of polymer materials is important to understand the thermo-mechanical behavior of organic microelectronic systems. However, incorporation of viscoelastic behavior into numerical models makes the models compute-intensive. This paper presents a different technique to incorporate the polymer viscoelastic behavior into the numerical models such that the computation time is not adversely affected without compromising the accuracy of the results obtained. In the proposed “pseudo viscoelastic” modeling technique, the modulus of the viscoelastic material is computed as a function of time and temperature loading history outside of the finite-element simulation, and is then input into the simulation as a thermo-elastic material incorporating the viscoelastic relaxation of the material. This paper compares the warpage results obtained through the proposed technique against a complete viscoelastic simulation model and experimental data, and it is seen that the maximum warpage predicted using the proposed technique agree within 10% compared with the results obtained from a “full” viscoelastic model. Also, it is shown through some of our simulations that the proposed technique could result in a computational time saving of more than 50% and hard disk space saving of 65%.

Download Full-text

Combustion Driven by Fragment-based Ab Initio Molecular Dynamics Simulation

10.26434/chemrxiv.11462160.v1 ◽

2019 ◽

Author(s):

Liqun Cao ◽

Jinzhe Zeng ◽

Mingyuan Xu ◽

Chih-Hao Chin ◽

Tong Zhu ◽

...

Keyword(s):

Molecular Dynamics ◽

Ab Initio ◽

Large Scale ◽

Methane Combustion ◽

Combustion Method ◽

Dynamics Simulation ◽

Ab Initio Molecular Dynamics ◽

Computational Time ◽

Combustion Mechanism ◽

Daily Lives

Combustion is a kind of important reaction that affects people's daily lives and the development of aerospace. Exploring the reaction mechanism contributes to the understanding of combustion and the more efficient use of fuels. Ab initio quantum mechanical (QM) calculation is precise but limited by its computational time for large-scale systems. In order to carry out reactive molecular dynamics (MD) simulation for combustion accurately and quickly, we develop the MFCC-combustion method in this study, which calculates the interaction between atoms using QM method at the level of MN15/6-31G(d). Each molecule in systems is treated as a fragment, and when the distance between any two atoms in different molecules is greater than 3.5 Å, a new fragment involved two molecules is produced in order to consider the two-body interaction. The deviations of MFCC-combustion from full system calculations are within a few kcal/mol, and the result clearly shows that the calculated energies of the different systems using MFCC-combustion are close to converging after the distance thresholds are larger than 3.5 Å for the two-body QM interactions. The methane combustion was studied with the MFCC-combustion method to explore the combustion mechanism of the methane-oxygen system.

Download Full-text

Double Precision Is Not Needed for Many-Body Calculations: New Conventional Wisdom

10.26434/chemrxiv.6104804.v1 ◽

2018 ◽

Author(s):

Pavel Pokhilko ◽

Evgeny Epifanovsky ◽

Anna I. Krylov

Keyword(s):

Large Scale ◽

Computation Time ◽

Coupled Cluster ◽

Double Precision ◽

Many Body ◽

Single Precision ◽

Parallel Performance ◽

Point Representation ◽

Electron Repulsion Integrals ◽

Cluster Methods

Using single precision floating point representation reduces the size of data and computation time by a factor of two relative to double precision conventionally used in electronic structure programs. For large-scale calculations, such as those encountered in many-body theories, reduced memory footprint alleviates memory and input/output bottlenecks. Reduced size of data can lead to additional gains due to improved parallel performance on CPUs and various accelerators. However, using single precision can potentially reduce the accuracy of computed observables. Here we report an implementation of coupled-cluster and equation-of-motion coupled-cluster methods with single and double excitations in single precision. We consider both standard implementation and one using Cholesky decomposition or resolution-of-the-identity of electron-repulsion integrals. Numerical tests illustrate that when single precision is used in correlated calculations, the loss of accuracy is insignificant and pure single-precision implementation can be used for computing energies, analytic gradients, excited states, and molecular properties. In addition to pure single-precision calculations, our implementation allows one to follow a single-precision calculation by clean-up iterations, fully recovering double-precision results while retaining significant savings.

Download Full-text

Vanadium Redox Flow Batteries: A Review Oriented to Fluid-Dynamic Optimization

Energies ◽

10.3390/en14010176 ◽

2020 ◽

Vol 14 (1) ◽

pp. 176

Author(s):

Iñigo Aramendia ◽

Unai Fernandez-Gamiz ◽

Adrian Martinez-San-Vicente ◽

Ekaitz Zulueta ◽

Jose Manuel Lopez-Guede

Keyword(s):

Large Scale ◽

Numerical Models ◽

Fluid Dynamic ◽

Renewable Energy Sources ◽

Vanadium Redox Flow Battery ◽

Redox Flow Batteries ◽

Design Flexibility ◽

Flow Batteries ◽

Exchange Membrane ◽

Vanadium Redox

Large-scale energy storage systems (ESS) are nowadays growing in popularity due to the increase in the energy production by renewable energy sources, which in general have a random intermittent nature. Currently, several redox flow batteries have been presented as an alternative of the classical ESS; the scalability, design flexibility and long life cycle of the vanadium redox flow battery (VRFB) have made it to stand out. In a VRFB cell, which consists of two electrodes and an ion exchange membrane, the electrolyte flows through the electrodes where the electrochemical reactions take place. Computational Fluid Dynamics (CFD) simulations are a very powerful tool to develop feasible numerical models to enhance the performance and lifetime of VRFBs. This review aims to present and discuss the numerical models developed in this field and, particularly, to analyze different types of flow fields and patterns that can be found in the literature. The numerical studies presented in this review are a helpful tool to evaluate several key parameters important to optimize the energy systems based on redox flow technologies.

Download Full-text

Distributed learning with indefinite kernels

Analysis and Applications ◽

10.1142/s021953051850032x ◽

2019 ◽

Vol 17 (06) ◽

pp. 947-975 ◽

Cited By ~ 2

Author(s):

Lei Shi

Keyword(s):

Large Scale ◽

Substantial Reduction ◽

Computation Time ◽

Distributed Learning ◽

Rates Of Convergence ◽

Regression Problem ◽

Data Set ◽

Regularization Scheme ◽

Original Algorithm ◽

Indefinite Kernel

We investigate the distributed learning with coefficient-based regularization scheme under the framework of kernel regression methods. Compared with the classical kernel ridge regression (KRR), the algorithm under consideration does not require the kernel function to be positive semi-definite and hence provides a simple paradigm for designing indefinite kernel methods. The distributed learning approach partitions a massive data set into several disjoint data subsets, and then produces a global estimator by taking an average of the local estimator on each data subset. Easy exercisable partitions and performing algorithm on each subset in parallel lead to a substantial reduction in computation time versus the standard approach of performing the original algorithm on the entire samples. We establish the first mini-max optimal rates of convergence for distributed coefficient-based regularization scheme with indefinite kernels. We thus demonstrate that compared with distributed KRR, the concerned algorithm is more flexible and effective in regression problem for large-scale data sets.

Download Full-text

Process-Based Model Prediction of Coastal Dune Erosion through Parametric Calibration

Journal of Marine Science and Engineering ◽

10.3390/jmse9060635 ◽

2021 ◽

Vol 9 (6) ◽

pp. 635

Author(s):

Hyeok Jin ◽

Kideok Do ◽

Sungwon Shin ◽

Daniel Cox

Keyword(s):

Large Scale ◽

Numerical Models ◽

Model Performance ◽

Coastal Dunes ◽

Wave Transformation ◽

Storm Event ◽

Face Model ◽

Dune Erosion ◽

Simulation Performance ◽

Sand Bar

Coastal dunes are important morphological features for both ecosystems and coastal hazard mitigation. Because understanding and predicting dune erosion phenomena is very important, various numerical models have been developed to improve the accuracy. In the present study, a process-based model (XBeachX) was tested and calibrated to improve the accuracy of the simulation of dune erosion from a storm event by adjusting the coefficients in the model and comparing it with the large-scale experimental data. The breaker slope coefficient was calibrated to predict cross-shore wave transformation more accurately. To improve the prediction of the dune erosion profile, the coefficients related to skewness and asymmetry were adjusted. Moreover, the bermslope coefficient was calibrated to improve the simulation performance of the bermslope near the dune face. Model performance was assessed based on the model-data comparisons. The calibrated XBeachX successfully predicted wave transformation and dune erosion phenomena. In addition, the results obtained from other two similar experiments on dune erosion with the same calibrated set matched well with the observed wave and profile data. However, the prediction of underwater sand bar evolution remains a challenge.

Download Full-text