SpecDB: A Database for Storing and Managing Mass Spectrometry Proteomics Data

Calibration Using a Single-Point External Reference Material Harmonizes Quantitative Mass Spectrometry Proteomics Data between Platforms and Laboratories

Analytical Chemistry ◽

10.1021/acs.analchem.8b04581 ◽

2018 ◽

Vol 90 (21) ◽

pp. 13112-13117 ◽

Cited By ~ 16

Author(s):

Lindsay K. Pino ◽

Brian C. Searle ◽

Eric L. Huang ◽

William Stafford Noble ◽

Andrew N. Hoofnagle ◽

...

Keyword(s):

Mass Spectrometry ◽

Reference Material ◽

Single Point ◽

Proteomics Data ◽

Quantitative Mass Spectrometry ◽

External Reference

Download Full-text

Comparative Proteomics of Root Apex and Root Elongation Zones Provides Insights into Molecular Mechanisms for Drought Stress and Recovery Adjustment in Switchgrass

Proteomes ◽

10.3390/proteomes8010003 ◽

2020 ◽

Vol 8 (1) ◽

pp. 3 ◽

Cited By ~ 2

Author(s):

Zhujia Ye ◽

Sasikiran Reddy Sangireddy ◽

Chih-Li Yu ◽

Dafeng Hui ◽

Kevin Howe ◽

...

Keyword(s):

Mass Spectrometry ◽

Drought Stress ◽

Molecular Mechanisms ◽

Root Zone ◽

Root Apex ◽

Root Tip ◽

Leaf Water Content ◽

Severe Drought ◽

Proteomics Data ◽

Drought Treatment

Switchgrass plants were grown in a Sandwich tube system to induce gradual drought stress by withholding watering. After 29 days, the leaf photosynthetic rate decreased significantly, compared to the control plants which were watered regularly. The drought-treated plants recovered to the same leaf water content after three days of re-watering. The root tip (1cm basal fragment, designated as RT1 hereafter) and the elongation/maturation zone (the next upper 1 cm tissue, designated as RT2 hereafter) tissues were collected at the 29th day of drought stress treatment, (named SDT for severe drought treated), after one (D1W) and three days (D3W) of re-watering. The tandem mass tags mass spectrometry-based quantitative proteomics analysis was performed to identify the proteomes, and drought-induced differentially accumulated proteins (DAPs). From RT1 tissues, 6156, 7687, and 7699 proteins were quantified, and 296, 535, and 384 DAPs were identified in the SDT, D1W, and D3W samples, respectively. From RT2 tissues, 7382, 7255, and 6883 proteins were quantified, and 393, 587, and 321 proteins DAPs were identified in the SDT, D1W, and D3W samples. Between RT1 and RT2 tissues, very few DAPs overlapped at SDT, but the number of such proteins increased during the recovery phase. A large number of hydrophilic proteins and stress-responsive proteins were induced during SDT and remained at a higher level during the recovery stages. A large number of DAPs in RT1 tissues maintained the same expression pattern throughout drought treatment and the recovery phases. The DAPs in RT1 tissues were classified in cell proliferation, mitotic cell division, and chromatin modification, and those in RT2 were placed in cell wall remodeling and cell expansion processes. This study provided information pertaining to root zone-specific proteome changes during drought and recover phases, which will allow us to select proteins (genes) as better defined targets for developing drought tolerant plants. The mass spectrometry proteomics data are available via ProteomeXchange with identifier PXD017441.

Download Full-text

Learning from heterogeneous data sources: an application in spatial proteomics

10.1101/022152 ◽

2015 ◽

Cited By ~ 1

Author(s):

Lisa M. Breckels ◽

Sean Holden ◽

David Wojnar ◽

Claire M. Mulvey ◽

Andy Christoforou ◽

...

Keyword(s):

Mass Spectrometry ◽

Support Vector Machine ◽

Transfer Learning ◽

High Throughput ◽

Cell Biology ◽

Heterogeneous Data ◽

Data Sources ◽

Support Vector ◽

Proteomics Data ◽

Heterogeneous Data Sources

AbstractSub-cellular localisation of proteins is an essential post-translational regulatory mechanism that can be assayed using high-throughput mass spectrometry (MS). These MS-based spatial proteomics experiments enable us to pinpoint the sub-cellular distribution of thousands of proteins in a specific system under controlled conditions. Recent advances in high-throughput MS methods have yielded a plethora of experimental spatial proteomics data for the cell biology community. Yet, there are many third-party data sources, such as immunofluorescence microscopy or protein annotations and sequences, which represent a rich and vast source of complementary information. We present a unique transfer learning classification framework that utilises a nearest-neighbour or support vector machine system, to integrate heterogeneous data sources to considerably improve on the quantity and quality of sub-cellular protein assignment. We demonstrate the utility of our algorithms through evaluation of five experimental datasets, from four different species in conjunction with four different auxiliary data sources to classify proteins to tens of sub-cellular compartments with high generalisation accuracy. We further apply the method to an experiment on pluripotent mouse embryonic stem cells to classify a set of previously unknown proteins, and validate our findings against a recent high resolution map of the mouse stem cell proteome. The methodology is distributed as part of the open-source Bioconductor pRoloc suite for spatial proteomics data analysis.AbbreviationsLOPITLocalisation of Organelle Proteins by Isotope TaggingPCPProtein Correlation ProfilingMLMachine learningTLTransfer learningSVMSupport vector machinePCAPrincipal component analysisGOGene OntologyCCCellular compartmentiTRAQIsobaric tags for relative and absolute quantitationTMTTandem mass tagsMSMass spectrometry

Download Full-text

MSpectraAI: a powerful platform for deciphering proteome profiling of multi-tumor mass spectrometry data by using deep neural networks

BMC Bioinformatics ◽

10.1186/s12859-020-03783-0 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Shisheng Wang ◽

Hongwen Zhu ◽

Hu Zhou ◽

Jingqiu Cheng ◽

Hao Yang

Keyword(s):

Mass Spectrometry ◽

Neural Networks ◽

Large Scale ◽

Deep Neural Networks ◽

Spectral Feature ◽

Mass Spectrometry Data ◽

Learning Approaches ◽

Proteomics Data ◽

Proteome Profiling ◽

Analytical Technique

Abstract Background Mass spectrometry (MS) has become a promising analytical technique to acquire proteomics information for the characterization of biological samples. Nevertheless, most studies focus on the final proteins identified through a suite of algorithms by using partial MS spectra to compare with the sequence database, while the pattern recognition and classification of raw mass-spectrometric data remain unresolved. Results We developed an open-source and comprehensive platform, named MSpectraAI, for analyzing large-scale MS data through deep neural networks (DNNs); this system involves spectral-feature swath extraction, classification, and visualization. Moreover, this platform allows users to create their own DNN model by using Keras. To evaluate this tool, we collected the publicly available proteomics datasets of six tumor types (a total of 7,997,805 mass spectra) from the ProteomeXchange consortium and classified the samples based on the spectra profiling. The results suggest that MSpectraAI can distinguish different types of samples based on the fingerprint spectrum and achieve better prediction accuracy in MS1 level (average 0.967). Conclusion This study deciphers proteome profiling of raw mass spectrometry data and broadens the promising application of the classification and prediction of proteomics data from multi-tumor samples using deep learning methods. MSpectraAI also shows a better performance compared to the other classical machine learning approaches.

Download Full-text

Management and Analysis of Mass Spectrometry Proteomics Data on the Grid

Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine, and Healthcare ◽

10.4018/978-1-60566-374-6.ch011 ◽

2011 ◽

pp. 206-227

Author(s):

Mario Cannataro ◽

Pietro Hiram Guzzi ◽

Giuseppe Tradigo ◽

Pierangelo Veltri

Keyword(s):

Mass Spectrometry ◽

Protein Identification ◽

Biomarker Discovery ◽

Integrated Management ◽

Molecular Signature ◽

Proteomics Data ◽

Computational Proteomics ◽

Manual Inspection ◽

A Cell ◽

Grid Based

Recent advances in high throughput technologies analysing biological samples enabled the researchers to collect a huge amount of data. In particular, mass spectrometry-based proteomics uses the mass spectrometry to investigate proteins expressed in an organism or a cell. The manual inspection of spectra is unfeasible, so the need to introduce a set of algorithms, tools and platforms to manage and analyze them arises. Computational Proteomics regards the computational methods for analyzing spectra data in qualitative (i.e. peptide/protein identification in tandem mass spectrometry), and quantitative proteomics (i.e. protein expression in samples), as well as in biomarker discovery (i.e. the identification of a molecular signature of a disease directly from spectra). This chapter presents main standards, tools, and technologies for building scalable, reusable, and portable applications in this field. The chapter surveys available solutions for computational proteomics and includes a deep description of MS-Analyzer, a Grid-based software platform for the integrated management and analysis of spectra data. MS-Analyzer provides efficient spectra management through a specialized spectra database, and supports the semantic composition of pre-processing and data mining services to analyze spectra on the Grid.

Download Full-text

PACOM: A Versatile Tool for Integrating, Filtering, Visualizing, and Comparing Multiple Large Mass Spectrometry Proteomics Data Sets

Journal of Proteome Research ◽

10.1021/acs.jproteome.7b00858 ◽

2018 ◽

Vol 17 (4) ◽

pp. 1547-1558 ◽

Cited By ~ 2

Author(s):

Salvador Martínez-Bartolomé ◽

J. Alberto Medina-Aunon ◽

Miguel Ángel López-García ◽

Carmen González-Tejedo ◽

Gorka Prieto ◽

...

Keyword(s):

Mass Spectrometry ◽

Large Mass ◽

Data Sets ◽

Proteomics Data ◽

Versatile Tool

Download Full-text

LOPIT-DC: A simpler approach to high-resolution spatial proteomics

10.1101/378364 ◽

2018 ◽

Author(s):

Aikaterini Geladaki ◽

Nina Kočevar Britovšek ◽

Lisa M. Breckels ◽

Tom S. Smith ◽

Claire M. Mulvey ◽

...

Keyword(s):

Mass Spectrometry ◽

High Resolution ◽

High Resolution Mass Spectrometry ◽

Protein Complexes ◽

Subcellular Localisation ◽

Proteomics Data ◽

Large Protein ◽

Subcellular Resolution ◽

First Time ◽

Human Dataset

AbstractHyperplexed Localisation of Organelle Proteins by Isotope Tagging (hyperLOPIT) is a well-established method for studying protein subcellular localisation in complex biological samples. As a simpler alternative we developed a second workflow named Localisation of Organelle Proteins by Isotope Tagging after Differential ultraCentrifugation (LOPIT-DC) which is faster and less resource-intensive. We present the most comprehensive high-resolution mass spectrometry-based human dataset to date and deliver a flexible set of subcellular proteomics protocols for sample preparation and data analysis. For the first time, we methodically compare these two different mass spectrometry-based spatial proteomics methods within the same study and also apply QSep, the first tool that objectively and robustly quantifies subcellular resolution in spatial proteomics data. Using both approaches we highlight suborganellar resolution and isoform-specific subcellular niches as well as the locations of large protein complexes and proteins involved in signalling pathways which play important roles in cancer and metabolism. Finally, we showcase an extensive analysis of the multilocalising proteome identified via both methods.

Download Full-text

RobNorm: model-based robust normalization method for labeled quantitative mass spectrometry proteomics data

Bioinformatics ◽

10.1093/bioinformatics/btaa904 ◽

2020 ◽

Author(s):

Meng Wang ◽

Lihua Jiang ◽

Ruiqi Jian ◽

Joanne Y Chan ◽

Qing Liu ◽

...

Keyword(s):

Mass Spectrometry ◽

Protein Expression ◽

Real Data ◽

Tissue Expression ◽

Supplementary Information ◽

Systematic Bias ◽

Proteomics Data ◽

Robust Fitting ◽

Fitting Method ◽

The One

Abstract Motivation Data normalization is an important step in processing proteomics data generated in mass spectrometry experiments, which aims to reduce sample-level variation and facilitate comparisons of samples. Previously published methods for normalization primarily depend on the assumption that the distribution of protein expression is similar across all samples. However, this assumption fails when the protein expression data is generated from heterogenous samples, such as from various tissue types. This led us to develop a novel data-driven method for improved normalization to correct the systematic bias meanwhile maintaining underlying biological heterogeneity. Results To robustly correct the systematic bias, we used the density-power-weight method to down-weigh outliers and extended the one-dimensional robust fitting method described in the previous work to our structured data. We then constructed a robustness criterion and developed a new normalization algorithm, called RobNorm. In simulation studies and analysis of real data from the genotype-tissue expression project, we compared and evaluated the performance of RobNorm against other normalization methods. We found that the RobNorm approach exhibits the greatest reduction in systematic bias while maintaining across-tissue variation, especially for datasets from highly heterogeneous samples. Availabilityand implementation https://github.com/mwgrassgreen/RobNorm. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Distance metric learning and support vector machines for classification of mass spectrometry proteomics data

International Journal of Knowledge Engineering and Soft Data Paradigms ◽

10.1504/ijkesdp.2009.028815 ◽

2009 ◽

Vol 1 (3) ◽

pp. 216 ◽

Cited By ~ 1

Author(s):

Qingzhong Liu ◽

Mengyu Qiao ◽

Andrew H. Sung

Keyword(s):

Mass Spectrometry ◽

Support Vector Machines ◽

Metric Learning ◽

Support Vector ◽

Distance Metric Learning ◽

Distance Metric ◽

Proteomics Data ◽

Vector Machines

Download Full-text

Do we want our data raw? Including binary mass spectrometry data in public proteomics data repositories

PROTEOMICS ◽

10.1002/pmic.200401302 ◽

2005 ◽

Vol 5 (13) ◽

pp. 3501-3505 ◽

Cited By ~ 42

Author(s):

Lennart Martens ◽

Alexey I. Nesvizhskii ◽

Henning Hermjakob ◽

Marcin Adamski ◽

Gilbert S. Omenn ◽

...

Keyword(s):

Mass Spectrometry ◽

Mass Spectrometry Data ◽

Proteomics Data ◽

Data Repositories

Download Full-text