scholarly journals Extremely fast and accurate open modification spectral library searching of high-resolution mass spectra using feature hashing and graphics processing units

2019 ◽  
Author(s):  
Wout Bittremieux ◽  
Kris Laukens ◽  
William Stafford Noble

AbstractOpen modification searching (OMS) is a powerful search strategy to identify peptides with any type of modification. OMS works by using a very wide precursor mass window to allow modified spectra to match against their unmodified variants, after which the modification types can be inferred from the corresponding precursor mass differences. A disadvantage of this strategy, however, is the large computational cost, because each query spectrum has to be compared against a multitude of candidate peptides.We have previously introduced the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. Here we demonstrate how this candidate selection procedure can be further optimized using graphics processing units. Additionally, we introduce a feature hashing scheme to convert high-resolution spectra to low-dimensional vectors. Based on these algorithmic advances, along with low-level code optimizations, the new version of ANN-SoLo is up to an order of magnitude faster than its initial version. This makes it possible to efficiently perform open searches on a large scale to gain a deeper understanding about the protein modification landscape. We demonstrate the computational efficiency and identification performance of ANN-SoLo based on a large data set of the draft human proteome.ANN-SoLo is implemented in Python and C++. It is freely available under the Apache 2.0 license at https://github.com/bittremieux/ANN-SoLo.

2021 ◽  
Author(s):  
Lars Hoffmann ◽  
Paul F. Baumeister ◽  
Zhongyin Cai ◽  
Jan Clemens ◽  
Sabine Griessbach ◽  
...  

Abstract. Lagrangian models are fundamental tools to study atmospheric transport processes and for practical applications such as dispersion modeling for anthropogenic and natural emission sources. However, conducting large-scale Lagrangian transport simulations with millions of air parcels or more can become numerically rather costly. In this study, we assessed the potential of exploiting graphics processing units (GPUs) to accelerate Lagrangian transport simulations. We ported the Massive-Parallel Trajectory Calculations (MPTRAC) model to GPUs using the open accelerator (OpenACC) programming model. The trajectory calculations conducted within the MPTRAC model were fully ported to GPUs, i.e., except for feeding in the meteorological input data and for extracting the particle output data, the code operates entirely on the GPU devices without frequent data transfers between CPU and GPU memory. Model verification, performance analyses, and scaling tests of the MPI/OpenMP/OpenACC hybrid parallelization of MPTRAC were conducted on the JUWELS Booster supercomputer operated by the Jülich Supercomputing Centre, Germany. The JUWELS Booster comprises 3744 NVIDIA A100 Tensor Core GPUs, providing a peak performance of 71.0 PFlop/s. As of June 2021, it is the most powerful supercomputer in Europe and listed among the most energy-efficient systems internationally. For large-scale simulations comprising 108 particles driven by the European Centre for Medium-Range Weather Forecasts' ERA5 reanalysis, the performance evaluation showed a maximum speedup of a factor of 16 due to the utilization of GPUs compared to CPU-only runs on the JUWELS Booster. In the large-scale GPU run, about 67 % of the runtime is spent on the physics calculations, conducted on the GPUs. Another 15 % of the runtime is required for file-I/O, mostly to read the large ERA5 data set from disk. Meteorological data preprocessing on the CPUs also requires about 15 % of the runtime. Although this study identified potential for further improvements of the GPU code, we consider the MPTRAC model ready for production runs on the JUWELS Booster in its present form. The GPU code provides a much faster time to solution than the CPU code, which is particularly relevant for near-real-time applications of a Lagrangian transport model.


2020 ◽  
Vol 10 (15) ◽  
pp. 5361
Author(s):  
Nabil Stendardo ◽  
Gilles Desthieux ◽  
Nabil Abdennadher ◽  
Peter Gallinelli

In the context of encouraging the development of renewable energy, this paper deals with the description of a software solution for mapping out solar potential in a large scale and in high resolution. We leverage the performance provided by Graphics Processing Units (GPUs) to accelerate shadow casting procedures (used both for direct sunlight exposure and the sky view factor), as well as use off-the-shelf components to compute an average weather pattern for a given area. Application of the approach is presented in the context of the solar cadaster of Greater Geneva (2000 km2). The results show that doing the analysis on a square tile of 3.4 km at a resolution of 0.5 m takes up to two hours, which is better than what we were achieving with the previous work. This shows that GPU-based calculations are highly competitive in the field of solar potential modeling.


Author(s):  
Alan Gray ◽  
Kevin Stratford

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.


2017 ◽  
Vol 10 (5) ◽  
pp. 2031-2055 ◽  
Author(s):  
Thomas Schwitalla ◽  
Hans-Stefan Bauer ◽  
Volker Wulfmeyer ◽  
Kirsten Warrach-Sagi

Abstract. Increasing computational resources and the demands of impact modelers, stake holders, and society envision seasonal and climate simulations with the convection-permitting resolution. So far such a resolution is only achieved with a limited-area model whose results are impacted by zonal and meridional boundaries. Here, we present the setup of a latitude-belt domain that reduces disturbances originating from the western and eastern boundaries and therefore allows for studying the impact of model resolution and physical parameterization. The Weather Research and Forecasting (WRF) model coupled to the NOAH land–surface model was operated during July and August 2013 at two different horizontal resolutions, namely 0.03 (HIRES) and 0.12° (LOWRES). Both simulations were forced by the European Centre for Medium-Range Weather Forecasts (ECMWF) operational analysis data at the northern and southern domain boundaries, and the high-resolution Operational Sea Surface Temperature and Sea Ice Analysis (OSTIA) data at the sea surface.The simulations are compared to the operational ECMWF analysis for the representation of large-scale features. To analyze the simulated precipitation, the operational ECMWF forecast, the CPC MORPHing (CMORPH), and the ENSEMBLES gridded observation precipitation data set (E-OBS) were used as references.Analyzing pressure, geopotential height, wind, and temperature fields as well as precipitation revealed (1) a benefit from the higher resolution concerning the reduction of monthly biases, root mean square error, and an improved Pearson skill score, and (2) deficiencies in the physical parameterizations leading to notable biases in distinct regions like the polar Atlantic for the LOWRES simulation, the North Pacific, and Inner Mongolia for both resolutions.In summary, the application of a latitude belt on a convection-permitting resolution shows promising results that are beneficial for future seasonal forecasting.


2021 ◽  
Author(s):  
John Taylor ◽  
Pablo Larraonndo ◽  
Bronis de Supinski

Abstract Society has benefited enormously from the continuous advancement in numerical weather prediction that has occurred over many decades driven by a combination of outstanding scientific, computational and technological breakthroughs. Here we demonstrate that data driven methods are now positioned to contribute to the next wave of major advances in atmospheric science. We show that data driven models can predict important meteorological quantities of interest to society such as global high resolution precipitation fields (0.25 degrees) and can deliver accurate forecasts of the future state of the atmosphere without prior knowledge of the laws of physics and chemistry. We also show how these data driven methods can be scaled to run on super-computers with up to 1024 modern graphics processing units (GPU) and beyond resulting in rapid training of data driven models, thus supporting a cycle of rapid research and innovation. Taken together, these two results illustrate the significant potential of data driven methods to advance atmospheric science and operational weather forecasting.


2016 ◽  
Author(s):  
R. J. Haarsma ◽  
M. Roberts ◽  
P. L. Vidale ◽  
C. A. Senior ◽  
A. Bellucci ◽  
...  

Abstract. Robust projections and predictions of climate variability and change, particularly at regional scales, rely on the driving processes being represented with fidelity in model simulations. The role of enhanced horizontal resolution in improved process representation in all components of the climate system is of growing interest, particularly as some recent simulations suggest the possibility for significant changes in both large-scale aspects of circulation, as well as improvements in small-scale processes and extremes. However, such high resolution global simulations at climate time scales, with resolutions of at least 50 km in the atmosphere and 0.25° in the ocean, have been performed at relatively few research centers and generally without overall coordination, primarily due to their computational cost. Assessing the robustness of the response of simulated climate to model resolution requires a large multi-model ensemble using a coordinated set of experiments. The Coupled Model Intercomparison Project 6 (CMIP6) is the ideal framework within which to conduct such a study, due to the strong link to models being developed for the CMIP DECK experiments and other MIPs. Increases in High Performance Computing (HPC) resources, as well as the revised experimental design for CMIP6, now enables a detailed investigation of the impact of increased resolution up to synoptic weather scales on the simulated mean climate and its variability. The High Resolution Model Intercomparison Project (HighResMIP) presented in this paper applies, for the first time, a multi-model approach to the systematic investigation of the impact of horizontal resolution. A coordinated set of experiments has been designed to assess both a standard and an enhanced horizontal resolution simulation in the atmosphere and ocean. The set of HighResMIP experiments is divided into three tiers consisting of atmosphere-only and coupled runs and spanning the period 1950-2050, with the possibility to extend to 2100, together with some additional targeted experiments. This paper describes the experimental set-up of HighResMIP, the analysis plan, the connection with the other CMIP6 endorsed MIPs, as well as the DECK and CMIP6 historical simulation. HighResMIP thereby focuses on one of the CMIP6 broad questions: “what are the origins and consequences of systematic model biases?”, but we also discuss how it addresses the World Climate Research Program (WCRP) grand challenges.


2020 ◽  
Author(s):  
Vera Thiemig ◽  
Peter Salamon ◽  
Goncalo N. Gomes ◽  
Jon O. Skøien ◽  
Markus Ziese ◽  
...  

<p>We present EMO-5, a Pan-European high-resolution (5 km), (sub-)daily, multi-variable meteorological data set especially developed to the needs of an operational, pan-European hydrological service (EFAS; European Flood Awareness System). The data set is built on historic and real-time observations coming from 18,964 meteorological in-situ stations, collected from 24 data providers, and 10,632 virtual stations from four high-resolution regional observational grids (CombiPrecip, ZAMG - INCA, EURO4M-APGD and CarpatClim) as well as one global reanalysis product (ERA-Interim-land). This multi-variable data set covers precipitation, temperature (average, min and max), wind speed, solar radiation and vapor pressure; all at daily resolution and in addition 6-hourly resolution for precipitation and average temperature. The original observations were thoroughly quality controlled before we used the Spheremap interpolation method to estimate the variable values for each of the 5 x 5 km grid cells and their affiliated uncertainty. EMO-5 v1 grids covering the time period from 1990 till 2019 will be released as a free and open Copernicus product mid-2020 (with a near real-time release of the latest gridded observations in future). We would like to present the great potential EMO-5 holds for the hydrological modelling community.</p><p> </p><p>footnote: EMO = European Meteorological Observations</p>


Geophysics ◽  
2012 ◽  
Vol 77 (4) ◽  
pp. WB37-WB45 ◽  
Author(s):  
Elliot Holtham ◽  
Douglas W. Oldenburg

A Z-Axis Tipper Electromagnetic Technique (ZTEM) survey is an airborne natural source electromagnetic survey that relates the vertical magnetic field to the horizontal magnetic fields measured at a reference station on the ground. For large airborne surveys, the high number of cells required to discretize the entire area at a reasonable resolution can make the computational cost of inverting the data set all at once prohibitively expensive. We present an iterative methodology that can be used to invert large natural source surveys by using a combination of coarse and fine meshes as well as a domain decomposition that allows the full model area to be split into smaller subproblems, which can be run in parallel. For this procedure, the entire data set is first inverted on a coarse mesh. The recovered coarse model and computed fields are used as starting models and source terms in the subsequent tiled inversions. After each round of tiled inversions, the tiles are merged together to form an update model, which is then forward modeled to determine if the model achieves the target misfit. Following this procedure, we first invert the data computed from a large synthetic model of the Noranda mining camp. The inverted models from this example are consistent among our different tiling choices. The recovered models show excellent large-scale agreement with the true model and they also recover several of the mineralized zones that were not apparent from the initial coarse inversion. Finally, we invert a [Formula: see text] block of the 2010 ZTEM survey collected over the porphyry Pebble Deposit in Alaska. The inverted ZTEM results are consistent with the results obtained using other electromagnetic methods.


Sign in / Sign up

Export Citation Format

Share Document