Extremely fast and accurate open modification spectral library searching of high-resolution mass spectra using feature hashing and graphics processing units

Mapping Intimacies ◽

10.1101/627497 ◽

2019 ◽

Cited By ~ 1

Author(s):

Wout Bittremieux ◽

Kris Laukens ◽

William Stafford Noble

Keyword(s):

High Resolution ◽

Graphics Processing Units ◽

Protein Modification ◽

Large Scale ◽

Computational Cost ◽

Selection Procedure ◽

Spectral Library ◽

Data Set ◽

Graphics Processing ◽

Feature Hashing

AbstractOpen modification searching (OMS) is a powerful search strategy to identify peptides with any type of modification. OMS works by using a very wide precursor mass window to allow modified spectra to match against their unmodified variants, after which the modification types can be inferred from the corresponding precursor mass differences. A disadvantage of this strategy, however, is the large computational cost, because each query spectrum has to be compared against a multitude of candidate peptides.We have previously introduced the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. Here we demonstrate how this candidate selection procedure can be further optimized using graphics processing units. Additionally, we introduce a feature hashing scheme to convert high-resolution spectra to low-dimensional vectors. Based on these algorithmic advances, along with low-level code optimizations, the new version of ANN-SoLo is up to an order of magnitude faster than its initial version. This makes it possible to efficiently perform open searches on a large scale to gain a deeper understanding about the protein modification landscape. We demonstrate the computational efficiency and identification performance of ANN-SoLo based on a large data set of the draft human proteome.ANN-SoLo is implemented in Python and C++. It is freely available under the Apache 2.0 license at https://github.com/bittremieux/ANN-SoLo.

Download Full-text

Extremely Fast and Accurate Open Modification Spectral Library Searching of High-Resolution Mass Spectra Using Feature Hashing and Graphics Processing Units

Journal of Proteome Research ◽

10.1021/acs.jproteome.9b00291 ◽

2019 ◽

Vol 18 (10) ◽

pp. 3792-3799 ◽

Cited By ~ 7

Author(s):

Wout Bittremieux ◽

Kris Laukens ◽

William Stafford Noble

Keyword(s):

High Resolution ◽

Graphics Processing Units ◽

Mass Spectra ◽

Spectral Library ◽

High Resolution Mass ◽

Graphics Processing ◽

Feature Hashing ◽

Resolution Mass

Download Full-text

Massive-Parallel Trajectory Calculations version 2.2 (MPTRAC-2.2): Lagrangian transport simulations on Graphics Processing Units (GPUs)

10.5194/gmd-2021-382 ◽

2021 ◽

Author(s):

Lars Hoffmann ◽

Paul F. Baumeister ◽

Zhongyin Cai ◽

Jan Clemens ◽

Sabine Griessbach ◽

...

Keyword(s):

Graphics Processing Units ◽

Large Scale ◽

Transport Model ◽

Meteorological Data ◽

Transport Processes ◽

Model Verification ◽

Data Set ◽

Lagrangian Transport ◽

Trajectory Calculations ◽

Graphics Processing

Abstract. Lagrangian models are fundamental tools to study atmospheric transport processes and for practical applications such as dispersion modeling for anthropogenic and natural emission sources. However, conducting large-scale Lagrangian transport simulations with millions of air parcels or more can become numerically rather costly. In this study, we assessed the potential of exploiting graphics processing units (GPUs) to accelerate Lagrangian transport simulations. We ported the Massive-Parallel Trajectory Calculations (MPTRAC) model to GPUs using the open accelerator (OpenACC) programming model. The trajectory calculations conducted within the MPTRAC model were fully ported to GPUs, i.e., except for feeding in the meteorological input data and for extracting the particle output data, the code operates entirely on the GPU devices without frequent data transfers between CPU and GPU memory. Model verification, performance analyses, and scaling tests of the MPI/OpenMP/OpenACC hybrid parallelization of MPTRAC were conducted on the JUWELS Booster supercomputer operated by the Jülich Supercomputing Centre, Germany. The JUWELS Booster comprises 3744 NVIDIA A100 Tensor Core GPUs, providing a peak performance of 71.0 PFlop/s. As of June 2021, it is the most powerful supercomputer in Europe and listed among the most energy-efficient systems internationally. For large-scale simulations comprising 108 particles driven by the European Centre for Medium-Range Weather Forecasts' ERA5 reanalysis, the performance evaluation showed a maximum speedup of a factor of 16 due to the utilization of GPUs compared to CPU-only runs on the JUWELS Booster. In the large-scale GPU run, about 67 % of the runtime is spent on the physics calculations, conducted on the GPUs. Another 15 % of the runtime is required for file-I/O, mostly to read the large ERA5 data set from disk. Meteorological data preprocessing on the CPUs also requires about 15 % of the runtime. Although this study identified potential for further improvements of the GPU code, we consider the MPTRAC model ready for production runs on the JUWELS Booster in its present form. The GPU code provides a much faster time to solution than the CPU code, which is particularly relevant for near-real-time applications of a Lagrangian transport model.

Download Full-text

GPU-Enabled Shadow Casting for Solar Potential Estimation in Large Urban Areas. Application to the Solar Cadaster of Greater Geneva

Applied Sciences ◽

10.3390/app10155361 ◽

2020 ◽

Vol 10 (15) ◽

pp. 5361

Author(s):

Nabil Stendardo ◽

Gilles Desthieux ◽

Nabil Abdennadher ◽

Peter Gallinelli

Keyword(s):

High Resolution ◽

Urban Areas ◽

Graphics Processing Units ◽

Large Scale ◽

View Factor ◽

Direct Sunlight ◽

Sky View Factor ◽

Shadow Casting ◽

Graphics Processing ◽

Better Than

In the context of encouraging the development of renewable energy, this paper deals with the description of a software solution for mapping out solar potential in a large scale and in high resolution. We leverage the performance provided by Graphics Processing Units (GPUs) to accelerate shadow casting procedures (used both for direct sunlight exposure and the sky view factor), as well as use off-the-shelf components to compute an average weather pattern for a given area. Application of the approach is presented in the context of the solar cadaster of Greater Geneva (2000 km2). The results show that doing the analysis on a square tile of 3.4 km at a resolution of 0.5 m takes up to two hours, which is better than what we were achieving with the previous work. This shows that GPU-based calculations are highly competitive in the field of solar potential modeling.

Download Full-text

A lightweight approach to performance portability with targetDP

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016682071 ◽

2016 ◽

Vol 32 (2) ◽

pp. 288-301

Author(s):

Alan Gray ◽

Kevin Stratford

Keyword(s):

Particle Physics ◽

Message Passing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Message Passing Interface ◽

Graphics Processing Unit ◽

Processing Unit ◽

Performance Portability ◽

Graphics Processing

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.

Download Full-text

Continuous high-resolution midlatitude-belt simulations for July–August 2013 with WRF

Geoscientific Model Development ◽

10.5194/gmd-10-2031-2017 ◽

2017 ◽

Vol 10 (5) ◽

pp. 2031-2055 ◽

Cited By ~ 14

Author(s):

Thomas Schwitalla ◽

Hans-Stefan Bauer ◽

Volker Wulfmeyer ◽

Kirsten Warrach-Sagi

Keyword(s):

High Resolution ◽

Land Surface ◽

Large Scale ◽

Temperature Fields ◽

Surface Model ◽

Limited Area ◽

Data Set ◽

Latitude Belt ◽

Simulated Precipitation ◽

The Impact

Abstract. Increasing computational resources and the demands of impact modelers, stake holders, and society envision seasonal and climate simulations with the convection-permitting resolution. So far such a resolution is only achieved with a limited-area model whose results are impacted by zonal and meridional boundaries. Here, we present the setup of a latitude-belt domain that reduces disturbances originating from the western and eastern boundaries and therefore allows for studying the impact of model resolution and physical parameterization. The Weather Research and Forecasting (WRF) model coupled to the NOAH land–surface model was operated during July and August 2013 at two different horizontal resolutions, namely 0.03 (HIRES) and 0.12° (LOWRES). Both simulations were forced by the European Centre for Medium-Range Weather Forecasts (ECMWF) operational analysis data at the northern and southern domain boundaries, and the high-resolution Operational Sea Surface Temperature and Sea Ice Analysis (OSTIA) data at the sea surface.The simulations are compared to the operational ECMWF analysis for the representation of large-scale features. To analyze the simulated precipitation, the operational ECMWF forecast, the CPC MORPHing (CMORPH), and the ENSEMBLES gridded observation precipitation data set (E-OBS) were used as references.Analyzing pressure, geopotential height, wind, and temperature fields as well as precipitation revealed (1) a benefit from the higher resolution concerning the reduction of monthly biases, root mean square error, and an improved Pearson skill score, and (2) deficiencies in the physical parameterizations leading to notable biases in distinct regions like the polar Atlantic for the LOWRES simulation, the North Pacific, and Inner Mongolia for both resolutions.In summary, the application of a latitude belt on a convection-permitting resolution shows promising results that are beneficial for future seasonal forecasting.

Download Full-text

Large-scale transient stability simulation on graphics processing units

2009 IEEE Power & Energy Society General Meeting ◽

10.1109/pes.2009.5275844 ◽

2009 ◽

Cited By ~ 14

Author(s):

Vahid Jalili-Marandi ◽

Venkata Dinavahi

Keyword(s):

Graphics Processing Units ◽

Large Scale ◽

Transient Stability ◽

Graphics Processing

Download Full-text

Data driven global weather predictions at high resolutions

10.21203/rs.3.rs-310930/v1 ◽

2021 ◽

Author(s):

John Taylor ◽

Pablo Larraonndo ◽

Bronis de Supinski

Keyword(s):

High Resolution ◽

Graphics Processing Units ◽

Weather Forecasting ◽

Weather Prediction ◽

Atmospheric Science ◽

Data Driven ◽

Research And Innovation ◽

Future State ◽

Graphics Processing ◽

Quantities Of Interest

Abstract Society has benefited enormously from the continuous advancement in numerical weather prediction that has occurred over many decades driven by a combination of outstanding scientific, computational and technological breakthroughs. Here we demonstrate that data driven methods are now positioned to contribute to the next wave of major advances in atmospheric science. We show that data driven models can predict important meteorological quantities of interest to society such as global high resolution precipitation fields (0.25 degrees) and can deliver accurate forecasts of the future state of the atmosphere without prior knowledge of the laws of physics and chemistry. We also show how these data driven methods can be scaled to run on super-computers with up to 1024 modern graphics processing units (GPU) and beyond resulting in rapid training of data driven models, thus supporting a cycle of rapid research and innovation. Taken together, these two results illustrate the significant potential of data driven methods to advance atmospheric science and operational weather forecasting.

Download Full-text

High Resolution Model Intercomparison Project (HighResMIP)

10.5194/gmd-2016-66 ◽

2016 ◽

Cited By ~ 5

Author(s):

R. J. Haarsma ◽

M. Roberts ◽

P. L. Vidale ◽

C. A. Senior ◽

A. Bellucci ◽

...

Keyword(s):

High Resolution ◽

Large Scale ◽

Computational Cost ◽

Horizontal Resolution ◽

Small Scale ◽

Model Intercomparison ◽

Resolution Model ◽

Intercomparison Project ◽

High Resolution Model ◽

The Impact

Abstract. Robust projections and predictions of climate variability and change, particularly at regional scales, rely on the driving processes being represented with fidelity in model simulations. The role of enhanced horizontal resolution in improved process representation in all components of the climate system is of growing interest, particularly as some recent simulations suggest the possibility for significant changes in both large-scale aspects of circulation, as well as improvements in small-scale processes and extremes. However, such high resolution global simulations at climate time scales, with resolutions of at least 50 km in the atmosphere and 0.25° in the ocean, have been performed at relatively few research centers and generally without overall coordination, primarily due to their computational cost. Assessing the robustness of the response of simulated climate to model resolution requires a large multi-model ensemble using a coordinated set of experiments. The Coupled Model Intercomparison Project 6 (CMIP6) is the ideal framework within which to conduct such a study, due to the strong link to models being developed for the CMIP DECK experiments and other MIPs. Increases in High Performance Computing (HPC) resources, as well as the revised experimental design for CMIP6, now enables a detailed investigation of the impact of increased resolution up to synoptic weather scales on the simulated mean climate and its variability. The High Resolution Model Intercomparison Project (HighResMIP) presented in this paper applies, for the first time, a multi-model approach to the systematic investigation of the impact of horizontal resolution. A coordinated set of experiments has been designed to assess both a standard and an enhanced horizontal resolution simulation in the atmosphere and ocean. The set of HighResMIP experiments is divided into three tiers consisting of atmosphere-only and coupled runs and spanning the period 1950-2050, with the possibility to extend to 2100, together with some additional targeted experiments. This paper describes the experimental set-up of HighResMIP, the analysis plan, the connection with the other CMIP6 endorsed MIPs, as well as the DECK and CMIP6 historical simulation. HighResMIP thereby focuses on one of the CMIP6 broad questions: “what are the origins and consequences of systematic model biases?”, but we also discuss how it addresses the World Climate Research Program (WCRP) grand challenges.

Download Full-text

EMO-5: Copernicus pan-European high-resolution meteorological data set for large-scale hydrological modelling

10.5194/egusphere-egu2020-21551 ◽

2020 ◽

Author(s):

Vera Thiemig ◽

Peter Salamon ◽

Goncalo N. Gomes ◽

Jon O. Skøien ◽

Markus Ziese ◽

...

Keyword(s):

High Resolution ◽

Real Time ◽

Large Scale ◽

Interpolation Method ◽

Meteorological Data ◽

Hydrological Modelling ◽

Data Set ◽

Average Temperature ◽

Time Period ◽

Meteorological Observations

We present EMO-5, a Pan-European high-resolution (5 km), (sub-)daily, multi-variable meteorological data set especially developed to the needs of an operational, pan-European hydrological service (EFAS; European Flood Awareness System). The data set is built on historic and real-time observations coming from 18,964 meteorological in-situ stations, collected from 24 data providers, and 10,632 virtual stations from four high-resolution regional observational grids (CombiPrecip, ZAMG - INCA, EURO4M-APGD and CarpatClim) as well as one global reanalysis product (ERA-Interim-land). This multi-variable data set covers precipitation, temperature (average, min and max), wind speed, solar radiation and vapor pressure; all at daily resolution and in addition 6-hourly resolution for precipitation and average temperature. The original observations were thoroughly quality controlled before we used the Spheremap interpolation method to estimate the variable values for each of the 5 x 5 km grid cells and their affiliated uncertainty. EMO-5 v1 grids covering the time period from 1990 till 2019 will be released as a free and open Copernicus product mid-2020 (with a near real-time release of the latest gridded observations in future). We would like to present the great potential EMO-5 holds for the hydrological modelling community.&#160;footnote: EMO = European Meteorological Observations

Download Full-text

Large-scale inversion of ZTEM data

Geophysics ◽

10.1190/geo2011-0367.1 ◽

2012 ◽

Vol 77 (4) ◽

pp. WB37-WB45 ◽

Cited By ~ 5

Author(s):

Elliot Holtham ◽

Douglas W. Oldenburg

Keyword(s):

Large Scale ◽

Computational Cost ◽

Natural Source ◽

Reference Station ◽

Coarse Mesh ◽

Data Set ◽

Entire Area ◽

Electromagnetic Methods ◽

True Model ◽

Coarse Model

A Z-Axis Tipper Electromagnetic Technique (ZTEM) survey is an airborne natural source electromagnetic survey that relates the vertical magnetic field to the horizontal magnetic fields measured at a reference station on the ground. For large airborne surveys, the high number of cells required to discretize the entire area at a reasonable resolution can make the computational cost of inverting the data set all at once prohibitively expensive. We present an iterative methodology that can be used to invert large natural source surveys by using a combination of coarse and fine meshes as well as a domain decomposition that allows the full model area to be split into smaller subproblems, which can be run in parallel. For this procedure, the entire data set is first inverted on a coarse mesh. The recovered coarse model and computed fields are used as starting models and source terms in the subsequent tiled inversions. After each round of tiled inversions, the tiles are merged together to form an update model, which is then forward modeled to determine if the model achieves the target misfit. Following this procedure, we first invert the data computed from a large synthetic model of the Noranda mining camp. The inverted models from this example are consistent among our different tiling choices. The recovered models show excellent large-scale agreement with the true model and they also recover several of the mineralized zones that were not apparent from the initial coarse inversion. Finally, we invert a [Formula: see text] block of the 2010 ZTEM survey collected over the porphyry Pebble Deposit in Alaska. The inverted ZTEM results are consistent with the results obtained using other electromagnetic methods.

Download Full-text