Do we want our data raw? Including binary mass spectrometry data in public proteomics data repositories

Lennart Martens; Alexey I. Nesvizhskii; Henning Hermjakob; Marcin Adamski; Gilbert S. Omenn; Joël Vandekerckhove; Kris Gevaert

doi:10.1002/pmic.200401302

Do we want our data raw? Including binary mass spectrometry data in public proteomics data repositories

Exploring the Human Plasma Proteome ◽

10.1002/9783527609482.ch15 ◽

2006 ◽

pp. 323-328

Author(s):

Lennart Martens ◽

Alexey I. Nesvizhskii ◽

Henning Hermjakob ◽

Marcin Adamski ◽

Gilbert S. Omenn ◽

...

Keyword(s):

Mass Spectrometry ◽

Mass Spectrometry Data ◽

Proteomics Data ◽

Data Repositories

Download Full-text

ppx: Programmatic access to proteomics data repositories

10.1101/2021.05.29.446304 ◽

2021 ◽

Author(s):

William E Fondrie ◽

Wout Bittremieux ◽

William S Noble

Keyword(s):

Mass Spectrometry ◽

Open Science ◽

Mass Spectrometry Data ◽

Reproducible Research ◽

Easy Access ◽

Proteomics Data ◽

Data Repositories ◽

Access To Data ◽

Python Package ◽

Programmatic Access

The volume of proteomics and mass spectrometry data available in public repositories continues to grow at a rapid pace as more researchers embrace open science practices. Open access to the data behind scientific discoveries has become critical to validate published findings and develop new computational tools. Here, we present ppx, a Python package that provides easy, programmatic access to the data stored in ProteomeXchange repositories, such as PRIDE and MassIVE. The ppx package can either be used as a command line tool or a Python package to retrieve the files and metadata associated with a project when provided its identifier. To demonstrate how ppx enhances reproducible research, we used ppx within a Snakemake workflow to reanalyze a published dataset with the open modification search tool ANN-SoLo and compared our reanalysis to the original results. We show that ppx readily integrates into workflows and our reanalysis produced results consistent with the original analysis. We envision that ppx will be a valuable tool for creating reproducible analyses, providing tool developers easy access to data for development, testing, and benchmarking, and enabling the use of mass spectrometry data in data-intensive analyses. The ppx package is freely available and open source under the MIT license at: https://github.com/wfondrie/ppx

Download Full-text

MSpectraAI: a powerful platform for deciphering proteome profiling of multi-tumor mass spectrometry data by using deep neural networks

BMC Bioinformatics ◽

10.1186/s12859-020-03783-0 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Shisheng Wang ◽

Hongwen Zhu ◽

Hu Zhou ◽

Jingqiu Cheng ◽

Hao Yang

Keyword(s):

Mass Spectrometry ◽

Neural Networks ◽

Large Scale ◽

Deep Neural Networks ◽

Spectral Feature ◽

Mass Spectrometry Data ◽

Learning Approaches ◽

Proteomics Data ◽

Proteome Profiling ◽

Analytical Technique

Abstract Background Mass spectrometry (MS) has become a promising analytical technique to acquire proteomics information for the characterization of biological samples. Nevertheless, most studies focus on the final proteins identified through a suite of algorithms by using partial MS spectra to compare with the sequence database, while the pattern recognition and classification of raw mass-spectrometric data remain unresolved. Results We developed an open-source and comprehensive platform, named MSpectraAI, for analyzing large-scale MS data through deep neural networks (DNNs); this system involves spectral-feature swath extraction, classification, and visualization. Moreover, this platform allows users to create their own DNN model by using Keras. To evaluate this tool, we collected the publicly available proteomics datasets of six tumor types (a total of 7,997,805 mass spectra) from the ProteomeXchange consortium and classified the samples based on the spectra profiling. The results suggest that MSpectraAI can distinguish different types of samples based on the fingerprint spectrum and achieve better prediction accuracy in MS1 level (average 0.967). Conclusion This study deciphers proteome profiling of raw mass spectrometry data and broadens the promising application of the classification and prediction of proteomics data from multi-tumor samples using deep learning methods. MSpectraAI also shows a better performance compared to the other classical machine learning approaches.

Download Full-text

Mass Spectrometry Data Repository Enhances Novel Metabolite Discoveries with Advances in Computational Metabolomics

Metabolites ◽

10.3390/metabo9060119 ◽

2019 ◽

Vol 9 (6) ◽

pp. 119 ◽

Cited By ~ 8

Author(s):

Hiroshi Tsugawa ◽

Aya Satoh ◽

Haruki Uchino ◽

Tomas Cajka ◽

Makoto Arita ◽

...

Keyword(s):

Mass Spectrometry ◽

Acyl Chain ◽

Algal Species ◽

Mass Spectrometry Data ◽

Data Repository ◽

Data Repositories ◽

Specific Product ◽

Untargeted Analysis ◽

Targeted Analysis ◽

Chlorella Variabilis

Mass spectrometry raw data repositories, including Metabolomics Workbench and MetaboLights, have contributed to increased transparency in metabolomics studies and the discovery of novel insights in biology by reanalysis with updated computational metabolomics tools. Herein, we reanalyzed the previously published lipidomics data from nine algal species, resulting in the annotation of 1437 lipids achieving a 40% increase in annotation compared to the previous results. Specifically, diacylglyceryl-carboxyhydroxy-methylcholine (DGCC) in Pavlova lutheri and Pleurochrysis carterae, glucuronosyldiacylglycerol (GlcADG) in Euglena gracilis, and P. carterae, phosphatidylmethanol (PMeOH) in E. gracilis, and several oxidized phospholipids (oxidized phosphatidylcholine, OxPC; phosphatidylethanolamine, OxPE; phosphatidylglycerol, OxPG; phosphatidylinositol, OxPI) in Chlorella variabilis were newly characterized with the enriched lipid spectral databases. Moreover, we integrated the data from untargeted and targeted analyses from data independent tandem mass spectrometry (DIA-MS/MS) acquisition, specifically the sequential window acquisition of all theoretical fragment-ion MS/MS (SWATH-MS/MS) spectra, to increase the lipidomic annotation coverage. After the creation of a global library of precursor and diagnostic ions of lipids by the MS-DIAL untargeted analysis, the co-eluted DIA-MS/MS spectra were resolved in MRMPROBS targeted analysis by tracing the specific product ions involved in acyl chain compositions. Our results indicated that the metabolite quantifications based on DIA-MS/MS chromatograms were somewhat inferior to the MS1-centric quantifications, while the annotation coverage outperformed those of the untargeted analysis of the data dependent and DIA-MS/MS data. Consequently, integrated analyses of untargeted and targeted approaches are necessary to extract the maximum amount of metabolome information, and our results showcase the value of data repositories for the discovery of novel insights in lipid biology.

Download Full-text

CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis

BMC Bioinformatics ◽

10.1186/s12859-021-03969-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Olga Permiakova ◽

Romain Guibert ◽

Alexandra Kraut ◽

Thomas Fortin ◽

Anne-Marie Hesse ◽

...

Keyword(s):

Mass Spectrometry ◽

Large Scale ◽

Clustering Algorithm ◽

Optimal Transport ◽

State Of The Art ◽

Data Representation ◽

Machine Learning Algorithms ◽

Mass Spectrometry Data ◽

Proteomics Data ◽

Chromatographic Elution

Abstract Background The clustering of data produced by liquid chromatography coupled to mass spectrometry analyses (LC-MS data) has recently gained interest to extract meaningful chemical or biological patterns. However, recent instrumental pipelines deliver data which size, dimensionality and expected number of clusters are too large to be processed by classical machine learning algorithms, so that most of the state-of-the-art relies on single pass linkage-based algorithms. Results We propose a clustering algorithm that solves the powerful but computationally demanding kernel k-means objective function in a scalable way. As a result, it can process LC-MS data in an acceptable time on a multicore machine. To do so, we combine three essential features: a compressive data representation, Nyström approximation and a hierarchical strategy. In addition, we propose new kernels based on optimal transport, which interprets as intuitive similarity measures between chromatographic elution profiles. Conclusions Our method, referred to as CHICKN, is evaluated on proteomics data produced in our lab, as well as on benchmark data coming from the literature. From a computational viewpoint, it is particularly efficient on raw LC-MS data. From a data analysis viewpoint, it provides clusters which differ from those resulting from state-of-the-art methods, while achieving similar performances. This highlights the complementarity of differently principle algorithms to extract the best from complex LC-MS data.

Download Full-text

Integration and analysis of CPTAC proteomics data in the context of cancer genomics in the cBioPortal

10.1101/247718 ◽

2018 ◽

Author(s):

Pamela Wu ◽

Zachary J Heins ◽

James T Muller ◽

Adam A Abeshouse ◽

Yichao Sun ◽

...

Keyword(s):

Mass Spectrometry ◽

Clinical Data ◽

Cancer Genomics ◽

Ovarian Tumors ◽

The Cancer Genome Atlas ◽

Mass Spectrometry Data ◽

Proteomics Data ◽

Data Types ◽

Level Data ◽

Graphical Summary

SummaryThe Clinical Proteomic Tumor Analysis Consortium (CPTAC) has produced extensive mass spectrometry based proteomics data for selected breast, colon and ovarian tumors from The Cancer Genome Atlas (TCGA). We have incorporated the CPTAC proteomics data into the cBioPotal to support easy exploration and integrative analysis of these proteomic datasets in the context of the clinical and genomics data from the same tumors. cBioPortal is an open source platform for exploring, visualizing, and analyzing multi-dimensional cancer genomics and clinical data. The public instance of the cBioPortal (http://cbioportal.org/) hosts more than 100 cancer genomics studies including all of the data from TCGA. Its biologist-friendly interface provides many rich analysis features, including a graphical summary of gene-level data across multiple platforms, correlation analysis between genes or other data types, survival analysis, and network visualization. Here, we present the integration of the CPTAC mass spectrometry based proteomics data into the cBioPortal, consisting of 77 breast, 95 colorectal, and 174 ovarian tumors that already have been profiled by TCGA for mutations, copy number alterations, gene expression, and DNA methylation. As a result, the CPTAC data can now be easily explored and analyzed in the cBioPortal in the context of clinical and genomics data. By integrating CPTAC data into cBioPortal, limitations of TCGA proteomics array data can be overcome while also providing a user-friendly web interface, a web API and an R client to query the mass spectrometry data together with genomic, epigenomic, and clinical data.

Download Full-text

Mass spectrometry data of diabetic rat sperm proteome treated with Gynura procumbens aqueous extract

F1000Research ◽

10.12688/f1000research.19222.1 ◽

2019 ◽

Vol 8 ◽

pp. 844

Author(s):

Khaidatul Akmar Kamaruzaman ◽

Wan Mohd Aizat ◽

Mahanem Mat Noor

Keyword(s):

Mass Spectrometry ◽

Aqueous Extract ◽

Sperm Quality ◽

Mass Spectrometry Data ◽

Male Rats ◽

Diabetic Rat ◽

Control Group ◽

Label Free ◽

Sprague Dawley ◽

Proteomics Data

Diabetes mellitus has a deleterious effect on the male reproductive system, especially on sperm quality and spermatogenesis. Gynura procumbens (G. procumbens) is a traditional herb known for its ability to improve the fertility of diabetes-induced male rats. This study was designed to identify the differential expression of sperm proteins after treatment with G. procumbens aqueous extract on diabetes-induced male rats. The sperm proteome was profiled using label-free shotgun proteomics analysis. Sprague Dawley rats used in this study were divided randomly into four groups. One group was a normal control group (healthy rats), while the three other groups were induced with 50 mg/kg bodyweight (BW) of streptozotocin (STZ) to emulate the diabetic condition. The diabetic rats were divided into negative control (non-treated diabetic), metformin-treated (positive control) and G. procumbens aqueous extract-treated (450 mg/kg BW) groups. Oral treatments were administered for 14 consecutive days before the rats were euthanized. Total sperm protein samples were extracted from the caudal epididymis and run through SDS-PAGE. Later, samples were digested using trypsin before liquid chromatography-tandem mass spectrometry (Thermo Orbitrap Fusion) analysis. The acquired data were processed using MaxQuant and Perseus software. The mass spectrometry proteomics data is available through ProteomeXchange Consortium via the PRIDE partner repository, with the dataset identifier PXD011373.

Download Full-text

How to use open-pFind in deep proteomics data analysis?— A protocol for rigorous identification and quantitation of peptides and proteins from mass spectrometry data

Biophysics Reports ◽

10.52601/bpr.2021.210004 ◽

2021 ◽

Vol 7 (3) ◽

pp. 207-226

Author(s):

Shao Guangcan ◽

◽

Cao Yong ◽

Chen Zhenlin ◽

Liu Chao ◽

...

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Mass Spectrometry Data ◽

Proteomics Data ◽

Proteomics Data Analysis

Download Full-text

154: Integration of TPSA and High-Throughput Mass Spectrometry Data Improves Prostate Cancer Prediction

The Journal of Urology ◽

10.1016/s0022-5347(18)30419-1 ◽

2007 ◽

Vol 177 (4S) ◽

pp. 52-53

Author(s):

Stefano Ongarello ◽

Eberhard Steiner ◽

Regina Achleitner ◽

Isabel Feuerstein ◽

Birgit Stenzel ◽

...

Keyword(s):

Prostate Cancer ◽

Mass Spectrometry ◽

High Throughput ◽

Mass Spectrometry Data ◽

Cancer Prediction

Download Full-text

Nonparametric Pre-Processing Methods and Inference Tools for Analyzing Time-of-Flight Mass Spectrometry Data.

Current Analytical Chemistry ◽

10.2174/157341107780361718 ◽

2007 ◽

Vol 3 (2) ◽

pp. 127-147 ◽

Cited By ~ 8

Author(s):

Anestis Antoniadis ◽

Jeremie Bigot ◽

Sophie Lambert-Lacroix ◽

Frederique Letue

Keyword(s):

Mass Spectrometry ◽

Time Of Flight ◽

Mass Spectrometry Data ◽

Processing Methods ◽

Flight Mass Spectrometry

Download Full-text