scholarly journals Do we want our data raw? Including binary mass spectrometry data in public proteomics data repositories

PROTEOMICS ◽  
2005 ◽  
Vol 5 (13) ◽  
pp. 3501-3505 ◽  
Author(s):  
Lennart Martens ◽  
Alexey I. Nesvizhskii ◽  
Henning Hermjakob ◽  
Marcin Adamski ◽  
Gilbert S. Omenn ◽  
...  
2006 ◽  
pp. 323-328
Author(s):  
Lennart Martens ◽  
Alexey I. Nesvizhskii ◽  
Henning Hermjakob ◽  
Marcin Adamski ◽  
Gilbert S. Omenn ◽  
...  

2021 ◽  
Author(s):  
William E Fondrie ◽  
Wout Bittremieux ◽  
William S Noble

The volume of proteomics and mass spectrometry data available in public repositories continues to grow at a rapid pace as more researchers embrace open science practices. Open access to the data behind scientific discoveries has become critical to validate published findings and develop new computational tools. Here, we present ppx, a Python package that provides easy, programmatic access to the data stored in ProteomeXchange repositories, such as PRIDE and MassIVE. The ppx package can either be used as a command line tool or a Python package to retrieve the files and metadata associated with a project when provided its identifier. To demonstrate how ppx enhances reproducible research, we used ppx within a Snakemake workflow to reanalyze a published dataset with the open modification search tool ANN-SoLo and compared our reanalysis to the original results. We show that ppx readily integrates into workflows and our reanalysis produced results consistent with the original analysis. We envision that ppx will be a valuable tool for creating reproducible analyses, providing tool developers easy access to data for development, testing, and benchmarking, and enabling the use of mass spectrometry data in data-intensive analyses. The ppx package is freely available and open source under the MIT license at: https://github.com/wfondrie/ppx


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Shisheng Wang ◽  
Hongwen Zhu ◽  
Hu Zhou ◽  
Jingqiu Cheng ◽  
Hao Yang

Abstract Background Mass spectrometry (MS) has become a promising analytical technique to acquire proteomics information for the characterization of biological samples. Nevertheless, most studies focus on the final proteins identified through a suite of algorithms by using partial MS spectra to compare with the sequence database, while the pattern recognition and classification of raw mass-spectrometric data remain unresolved. Results We developed an open-source and comprehensive platform, named MSpectraAI, for analyzing large-scale MS data through deep neural networks (DNNs); this system involves spectral-feature swath extraction, classification, and visualization. Moreover, this platform allows users to create their own DNN model by using Keras. To evaluate this tool, we collected the publicly available proteomics datasets of six tumor types (a total of 7,997,805 mass spectra) from the ProteomeXchange consortium and classified the samples based on the spectra profiling. The results suggest that MSpectraAI can distinguish different types of samples based on the fingerprint spectrum and achieve better prediction accuracy in MS1 level (average 0.967). Conclusion This study deciphers proteome profiling of raw mass spectrometry data and broadens the promising application of the classification and prediction of proteomics data from multi-tumor samples using deep learning methods. MSpectraAI also shows a better performance compared to the other classical machine learning approaches.


Metabolites ◽  
2019 ◽  
Vol 9 (6) ◽  
pp. 119 ◽  
Author(s):  
Hiroshi Tsugawa ◽  
Aya Satoh ◽  
Haruki Uchino ◽  
Tomas Cajka ◽  
Makoto Arita ◽  
...  

Mass spectrometry raw data repositories, including Metabolomics Workbench and MetaboLights, have contributed to increased transparency in metabolomics studies and the discovery of novel insights in biology by reanalysis with updated computational metabolomics tools. Herein, we reanalyzed the previously published lipidomics data from nine algal species, resulting in the annotation of 1437 lipids achieving a 40% increase in annotation compared to the previous results. Specifically, diacylglyceryl-carboxyhydroxy-methylcholine (DGCC) in Pavlova lutheri and Pleurochrysis carterae, glucuronosyldiacylglycerol (GlcADG) in Euglena gracilis, and P. carterae, phosphatidylmethanol (PMeOH) in E. gracilis, and several oxidized phospholipids (oxidized phosphatidylcholine, OxPC; phosphatidylethanolamine, OxPE; phosphatidylglycerol, OxPG; phosphatidylinositol, OxPI) in Chlorella variabilis were newly characterized with the enriched lipid spectral databases. Moreover, we integrated the data from untargeted and targeted analyses from data independent tandem mass spectrometry (DIA-MS/MS) acquisition, specifically the sequential window acquisition of all theoretical fragment-ion MS/MS (SWATH-MS/MS) spectra, to increase the lipidomic annotation coverage. After the creation of a global library of precursor and diagnostic ions of lipids by the MS-DIAL untargeted analysis, the co-eluted DIA-MS/MS spectra were resolved in MRMPROBS targeted analysis by tracing the specific product ions involved in acyl chain compositions. Our results indicated that the metabolite quantifications based on DIA-MS/MS chromatograms were somewhat inferior to the MS1-centric quantifications, while the annotation coverage outperformed those of the untargeted analysis of the data dependent and DIA-MS/MS data. Consequently, integrated analyses of untargeted and targeted approaches are necessary to extract the maximum amount of metabolome information, and our results showcase the value of data repositories for the discovery of novel insights in lipid biology.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Olga Permiakova ◽  
Romain Guibert ◽  
Alexandra Kraut ◽  
Thomas Fortin ◽  
Anne-Marie Hesse ◽  
...  

Abstract Background The clustering of data produced by liquid chromatography coupled to mass spectrometry analyses (LC-MS data) has recently gained interest to extract meaningful chemical or biological patterns. However, recent instrumental pipelines deliver data which size, dimensionality and expected number of clusters are too large to be processed by classical machine learning algorithms, so that most of the state-of-the-art relies on single pass linkage-based algorithms. Results We propose a clustering algorithm that solves the powerful but computationally demanding kernel k-means objective function in a scalable way. As a result, it can process LC-MS data in an acceptable time on a multicore machine. To do so, we combine three essential features: a compressive data representation, Nyström approximation and a hierarchical strategy. In addition, we propose new kernels based on optimal transport, which interprets as intuitive similarity measures between chromatographic elution profiles. Conclusions Our method, referred to as CHICKN, is evaluated on proteomics data produced in our lab, as well as on benchmark data coming from the literature. From a computational viewpoint, it is particularly efficient on raw LC-MS data. From a data analysis viewpoint, it provides clusters which differ from those resulting from state-of-the-art methods, while achieving similar performances. This highlights the complementarity of differently principle algorithms to extract the best from complex LC-MS data.


2018 ◽  
Author(s):  
Pamela Wu ◽  
Zachary J Heins ◽  
James T Muller ◽  
Adam A Abeshouse ◽  
Yichao Sun ◽  
...  

SummaryThe Clinical Proteomic Tumor Analysis Consortium (CPTAC) has produced extensive mass spectrometry based proteomics data for selected breast, colon and ovarian tumors from The Cancer Genome Atlas (TCGA). We have incorporated the CPTAC proteomics data into the cBioPotal to support easy exploration and integrative analysis of these proteomic datasets in the context of the clinical and genomics data from the same tumors. cBioPortal is an open source platform for exploring, visualizing, and analyzing multi-dimensional cancer genomics and clinical data. The public instance of the cBioPortal (http://cbioportal.org/) hosts more than 100 cancer genomics studies including all of the data from TCGA. Its biologist-friendly interface provides many rich analysis features, including a graphical summary of gene-level data across multiple platforms, correlation analysis between genes or other data types, survival analysis, and network visualization. Here, we present the integration of the CPTAC mass spectrometry based proteomics data into the cBioPortal, consisting of 77 breast, 95 colorectal, and 174 ovarian tumors that already have been profiled by TCGA for mutations, copy number alterations, gene expression, and DNA methylation. As a result, the CPTAC data can now be easily explored and analyzed in the cBioPortal in the context of clinical and genomics data. By integrating CPTAC data into cBioPortal, limitations of TCGA proteomics array data can be overcome while also providing a user-friendly web interface, a web API and an R client to query the mass spectrometry data together with genomic, epigenomic, and clinical data.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 844
Author(s):  
Khaidatul Akmar Kamaruzaman ◽  
Wan Mohd Aizat ◽  
Mahanem Mat Noor

Diabetes mellitus has a deleterious effect on the male reproductive system, especially on sperm quality and spermatogenesis. Gynura procumbens (G. procumbens) is a traditional herb known for its ability to improve the fertility of diabetes-induced male rats. This study was designed to identify the differential expression of sperm proteins after treatment with G. procumbens aqueous extract on diabetes-induced male rats. The sperm proteome was profiled using label-free shotgun proteomics analysis. Sprague Dawley rats used in this study were divided randomly into four groups. One group was a normal control group (healthy rats), while the three other groups were induced with 50 mg/kg  bodyweight (BW) of streptozotocin (STZ) to emulate the diabetic condition. The diabetic rats were divided into negative control (non-treated diabetic), metformin-treated (positive control) and G. procumbens aqueous extract-treated (450 mg/kg BW) groups. Oral treatments were administered for 14 consecutive days before the rats were euthanized. Total sperm protein samples were extracted from the caudal epididymis and run through SDS-PAGE. Later, samples were digested using trypsin before liquid chromatography-tandem mass spectrometry (Thermo Orbitrap Fusion) analysis. The acquired data were processed using MaxQuant and Perseus software. The mass spectrometry proteomics data is available through ProteomeXchange Consortium via the PRIDE partner repository, with the dataset identifier PXD011373.


2007 ◽  
Vol 177 (4S) ◽  
pp. 52-53
Author(s):  
Stefano Ongarello ◽  
Eberhard Steiner ◽  
Regina Achleitner ◽  
Isabel Feuerstein ◽  
Birgit Stenzel ◽  
...  

2007 ◽  
Vol 3 (2) ◽  
pp. 127-147 ◽  
Author(s):  
Anestis Antoniadis ◽  
Jeremie Bigot ◽  
Sophie Lambert-Lacroix ◽  
Frederique Letue

Sign in / Sign up

Export Citation Format

Share Document