scholarly journals Massive-Parallel Trajectory Calculations version 2.2 (MPTRAC-2.2): Lagrangian transport simulations on Graphics Processing Units (GPUs)

2021 ◽  
Author(s):  
Lars Hoffmann ◽  
Paul Baumeister ◽  
Zhongyin Cai ◽  
Jan Clemens ◽  
Sabine Griessbach ◽  
...  

Lagrangian models are powerful tools to study atmospheric transport processes. However, conducting large-scaleLagrangian transport simulations with many air parcels can become numerically rather costly. In this study, we assessed the potential of exploiting graphics processing units (GPUs) to accelerate Lagrangian transport simulations. We ported the Massive-Parallel Trajectory Calculations (MPTRAC) model to GPUs using the open accelerator (OpenACC) programming model. The trajectory calculations conducted within the MPTRAC model have been fully ported to GPUs, i. e., except for feeding in the meteorological input data and for extracting the particle output data, the code operates entirely on the GPU devices without frequent data transfers between CPU and GPU memory. Model verification, performance analyses, and scaling tests of the MPI/OpenMP/OpenACC hybrid parallelization of MPTRAC have been conducted on the JUWELS Booster supercomputer operated by the Jülich Supercomputing Centre, Germany. The JUWELS Booster comprises 3744 NVIDIA A100 Tensor CoreGPUs, providing a peak performance of 71.0 PFlop/s. As of June 2021, it is the most powerful supercomputer in Europe and listed among the most energy-efficient systems internationally. For large-scale simulations comprising 100 million particles driven by the European Centre for Medium-Range Weather Forecasts’ ERA5 reanalysis, the performance evaluation showed a maximum speedup of a factor of 16 due to the utilization of GPUs compared to CPU-only runs on the JUWELS Booster. In the large-scale GPU run, about 67 % of the runtime is spent on the physics calculations, being conducted on the GPUs. Another 15 % of the runtime is required for file-I/O, mostly to read the ERA5 data from disk. Meteorological data preprocessing on the CPUs also requires about 15 % of the runtime. Although this study identified potential for further improvements of the GPU code, we consider the MPTRAC model to be ready for production runs on the JUWELS Booster in its present form. The GPU code provides a much faster time to solution than the CPU code, which is particularly relevant for near-real-time applications of a Lagrangian transport model

2021 ◽  
Author(s):  
Lars Hoffmann ◽  
Paul F. Baumeister ◽  
Zhongyin Cai ◽  
Jan Clemens ◽  
Sabine Griessbach ◽  
...  

Abstract. Lagrangian models are fundamental tools to study atmospheric transport processes and for practical applications such as dispersion modeling for anthropogenic and natural emission sources. However, conducting large-scale Lagrangian transport simulations with millions of air parcels or more can become numerically rather costly. In this study, we assessed the potential of exploiting graphics processing units (GPUs) to accelerate Lagrangian transport simulations. We ported the Massive-Parallel Trajectory Calculations (MPTRAC) model to GPUs using the open accelerator (OpenACC) programming model. The trajectory calculations conducted within the MPTRAC model were fully ported to GPUs, i.e., except for feeding in the meteorological input data and for extracting the particle output data, the code operates entirely on the GPU devices without frequent data transfers between CPU and GPU memory. Model verification, performance analyses, and scaling tests of the MPI/OpenMP/OpenACC hybrid parallelization of MPTRAC were conducted on the JUWELS Booster supercomputer operated by the Jülich Supercomputing Centre, Germany. The JUWELS Booster comprises 3744 NVIDIA A100 Tensor Core GPUs, providing a peak performance of 71.0 PFlop/s. As of June 2021, it is the most powerful supercomputer in Europe and listed among the most energy-efficient systems internationally. For large-scale simulations comprising 108 particles driven by the European Centre for Medium-Range Weather Forecasts' ERA5 reanalysis, the performance evaluation showed a maximum speedup of a factor of 16 due to the utilization of GPUs compared to CPU-only runs on the JUWELS Booster. In the large-scale GPU run, about 67 % of the runtime is spent on the physics calculations, conducted on the GPUs. Another 15 % of the runtime is required for file-I/O, mostly to read the large ERA5 data set from disk. Meteorological data preprocessing on the CPUs also requires about 15 % of the runtime. Although this study identified potential for further improvements of the GPU code, we consider the MPTRAC model ready for production runs on the JUWELS Booster in its present form. The GPU code provides a much faster time to solution than the CPU code, which is particularly relevant for near-real-time applications of a Lagrangian transport model.


Author(s):  
Alan Gray ◽  
Kevin Stratford

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.


2009 ◽  
Vol 9 (19) ◽  
pp. 7313-7323 ◽  
Author(s):  
H. Wang ◽  
D. J. Jacob ◽  
M. Kopacz ◽  
D. B. A. Jones ◽  
P. Suntharalingam ◽  
...  

Abstract. Inverse modeling of CO2 satellite observations to better quantify carbon surface fluxes requires a chemical transport model (CTM) to relate the fluxes to the observed column concentrations. CTM transport error is a major source of uncertainty. We show that its effect can be reduced by using CO satellite observations as additional constraint in a joint CO2-CO inversion. CO is measured from space with high precision, is strongly correlated with CO2, and is more sensitive than CO2 to CTM transport errors on synoptic and smaller scales. Exploiting this constraint requires statistics for the CTM transport error correlation between CO2 and CO, which is significantly different from the correlation between the concentrations themselves. We estimate the error correlation globally and for different seasons by a paired-model method (comparing GEOS-Chem CTM simulations of CO2 and CO columns using different assimilated meteorological data sets for the same meteorological year) and a paired-forecast method (comparing 48- vs. 24-h GEOS-5 CTM forecasts of CO2 and CO columns for the same forecast time). We find strong error correlations (r2>0.5) between CO2 and CO columns over much of the extra-tropical Northern Hemisphere throughout the year, and strong consistency between different methods to estimate the error correlation. Application of the averaging kernels used in the retrieval for thermal IR CO measurements weakens the correlation coefficients by 15% on average (mostly due to variability in the averaging kernels) but preserves the large-scale correlation structure. We present a simple inverse modeling application to demonstrate that CO2-CO error correlations can indeed significantly reduce uncertainty on surface carbon fluxes in a joint CO2-CO inversion vs. a CO2-only inversion.


2006 ◽  
Vol 41 (1) ◽  
pp. 24-36 ◽  
Author(s):  
Karl-Erich Lindenschmidt ◽  
René Wodrich ◽  
Cornelia Hesse

Abstract A hypothesis stating that more complex descriptions of processes in models simulate reality better (less error) but with more unreliable predictability (more sensitivity) is tested using a river water quality model. This hypothesis was extended stating that applying the model on a domain of smaller scale requires greater complexity to capture the same accuracy as in large-scale model applications which, however, leads to increased model sensitivity. The sediment and pollutant transport model TOXI, a module in the WASP5 package, was applied to two case studies of different scale: a 90-km course of the 5th order (sensu Strahler 1952) lower Saale river, Germany (large scale), and the lock-and-weir system at Calbe (small scale) situated on the same river course. A sensitivity analysis of several parameters relating to the physical and chemical transport processes of suspended solids, chloride, arsenic, iron and zinc shows that the coefficient, which partitions the total heavy metal mass into its dissolved and sorbed fraction, is a very sensitive parameter. Hence, the complexity of the sorptive process was varied to test the hypotheses.


2009 ◽  
Vol 9 (3) ◽  
pp. 11783-11810
Author(s):  
H. Wang ◽  
D. J. Jacob ◽  
M. Kopacz ◽  
D. B. A. Jones ◽  
P. Suntharalingam ◽  
...  

Abstract. Inverse modeling of CO2 satellite observations to better quantify carbon surface fluxes requires a forward model such as a chemical transport model (CTM) to relate the fluxes to the observed column concentrations. Model transport error is an important source of observational error. We investigate the potential of using CO satellite observations as additional constraints in a joint CO2–CO inversion to improve CO2 flux estimates, by exploiting the CTM transport error correlations between CO2 and CO. We estimate the error correlation globally and for different seasons by a paired-model method (comparing CTM simulations of CO2 and CO columns using different assimilated meteorological data sets for the same meteorological year) and a paired-forecast method (comparing 48- vs. 24-h CTM forecasts of CO2 and CO columns for the same forecast time). We find strong positive and negative error correlations (r2>0.5) between CO2 and CO columns over much of the world throughout the year, and strong consistency between different methods to estimate the error correlation. Application of the averaging kernels used in the retrieval for thermal IR CO measurements weakens the correlation coefficients by 15% on average (mostly due to variability in the averaging kernels) but preserves the large-scale correlation structure. Results from a testbed inverse modeling application show that CO2–CO error correlations can indeed significantly reduce uncertainty on surface carbon fluxes in a joint CO2–CO inversion vs. a CO2–only inversion.


2020 ◽  
Vol 20 (23) ◽  
pp. 15227-15245
Author(s):  
Edward J. Charlesworth ◽  
Ann-Kristin Dugstad ◽  
Frauke Fritsch ◽  
Patrick Jöckel ◽  
Felix Plöger

Abstract. We investigate the impact of model trace gas transport schemes on the representation of transport processes in the upper troposphere and lower stratosphere. Towards this end, the Chemical Lagrangian Model of the Stratosphere (CLaMS) was coupled to the ECHAM/MESSy Atmospheric Chemistry (EMAC) model and results from the two transport schemes (Lagrangian critical Lyapunov scheme and flux-form semi-Lagrangian, respectively) were compared. Advection in CLaMS was driven by the EMAC simulation winds, and thereby the only differences in transport between the two sets of results were caused by differences in the transport schemes. To analyze the timescales of large-scale transport, multiple tropical-surface-emitted tracer pulses were performed to calculate age of air spectra, while smaller-scale transport was analyzed via idealized, radioactively decaying tracers emitted in smaller regions (nine grid cells) within the stratosphere. The results show that stratospheric transport barriers are significantly stronger for Lagrangian EMAC-CLaMS transport due to reduced numerical diffusion. In particular, stronger tracer gradients emerge around the polar vortex, at the subtropical jets, and at the edge of the tropical pipe. Inside the polar vortex, the more diffusive EMAC flux-form semi-Lagrangian transport scheme results in a substantially higher amount of air with ages from 0 to 2 years (up to a factor of 5 higher). In the lowermost stratosphere, mean age of air is much smaller in EMAC, owing to stronger diffusive cross-tropopause transport. Conversely, EMAC-CLaMS shows a summertime lowermost stratosphere age inversion – a layer of older air residing below younger air (an “eave”). This pattern is caused by strong poleward transport above the subtropical jet and is entirely blurred by diffusive cross-tropopause transport in EMAC. Potential consequences from the choice of the transport scheme on chemistry–climate and geoengineering simulations are discussed.


Sign in / Sign up

Export Citation Format

Share Document