SLM-Transform: A Method for Memory-Efficient Indexing of Spectra for Database Search in LC-MS/MS Proteomics

Mapping Intimacies ◽

10.1101/531681 ◽

2019 ◽

Author(s):

Muhammad Haseeb ◽

Muaaz G. Awan ◽

Alexander S. Cadigan ◽

Fahad Saeed

Keyword(s):

Peptide Identification ◽

Transformation Method ◽

Database Search ◽

Constant Time ◽

Theoretical Spectrum ◽

Post Translational Modifications ◽

Stored Information ◽

Indexing Technique ◽

Memory Constraints ◽

Memory Efficient

AbstractThe most commonly used strategy for peptide identification in shotgun LC-MS/MS proteomics involves searching of MS/MS data against an in-silico digested protein sequence database. Typically, the digested peptide sequences are indexed into the memory to allow faster search times. However, subjecting a database to post-translational modifications (PTMs) during digestion results in an exponential increase in the number of peptides and therefore memory consumption. This limits the usage of existing fragment-ion based open-search algorithms for databases with several PTMs. In this paper, we propose a novel fragment-ion indexing technique which is analogous to suffix array transformation and allows constant time querying of indexed ions. We extend our transformation method, called SLM-Transform, by constructing ion buckets that allow querying of all indexed ions by mass by only storing information on distribution of ion-frequencies within buckets. The stored information is used with a regression technique to locate the position of ions in constant time. Moreover, the number of theoretical b- and y-ions generated and indexed for each theoretical spectrum are limited. Our results show that SLM-Transform allows indexing of up to 4x peptides than other leading fragment-ion based database search tools within the same memory constraints. We show that SLM-Transform based index allows indexing of over 83 million peptides within 26GB RAM as compared to 80GB required by MSFragger. Finally, we show the constant ion retrieval time for SLM-Transform based index allowing ultrafast peptide search speeds.Source code will be made available at: https://github.com/pcdslab/slmindex

Download Full-text

Deep Semi-Supervised Learning Improves Universal Peptide Identification of Shotgun Proteomics Data

10.1101/2020.11.12.380881 ◽

2020 ◽

Author(s):

John T. Halloran ◽

Gregor Urban ◽

David Rocke ◽

Pierre Baldi

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Peptide Identification ◽

Shotgun Proteomics ◽

Database Search ◽

Supervised Machine Learning ◽

Superior Performance ◽

Support Vector ◽

Proteomics Data ◽

Learning Classifier

AbstractSemi-supervised machine learning post-processors critically improve peptide identification of shot-gun proteomics data. Such post-processors accept the peptide-spectrum matches (PSMs) and feature vectors resulting from a database search, train a machine learning classifier, and recalibrate PSMs using the trained parameters, often yielding significantly more identified peptides across q-value thresholds. However, current state-of-the-art post-processors rely on shallow machine learning methods, such as support vector machines. In contrast, the powerful training capabilities of deep learning models have displayed superior performance to shallow models in an ever-growing number of other fields. In this work, we show that deep models significantly improve the recalibration of PSMs compared to the most accurate and widely-used post-processors, such as Percolator and PeptideProphet. Furthermore, we show that deep learning is able to adaptively analyze complex datasets and features for more accurate universal post-processing, leading to both improved Prosit analysis and markedly better recalibration of recently developed database-search functions.

Download Full-text

Tailor: A Nonparametric and Rapid Score Calibration Method for Database Search-Based Peptide Identification in Shotgun Proteomics

Journal of Proteome Research ◽

10.1021/acs.jproteome.9b00736 ◽

2020 ◽

Vol 19 (4) ◽

pp. 1481-1490

Author(s):

Pavel Sulimov ◽

Attila Kertész-Farkas

Keyword(s):

Peptide Identification ◽

Shotgun Proteomics ◽

Calibration Method ◽

Database Search

Download Full-text

JUMP: A Tag-based Database Search Tool for Peptide Identification with High Sensitivity and Accuracy

Molecular & Cellular Proteomics ◽

10.1074/mcp.o114.039586 ◽

2014 ◽

Vol 13 (12) ◽

pp. 3663-3673 ◽

Cited By ~ 48

Author(s):

Xusheng Wang ◽

Yuxin Li ◽

Zhiping Wu ◽

Hong Wang ◽

Haiyan Tan ◽

...

Keyword(s):

Peptide Identification ◽

High Sensitivity ◽

Database Search ◽

Search Tool

Download Full-text

A Hybrid Method for Peptide Identification Using Integer Linear Optimization, Local Database Search, and Quadrupole Time-of-Flight or OrbiTrap Tandem Mass Spectrometry

Journal of Proteome Research ◽

10.1021/pr700577z ◽

2008 ◽

Vol 7 (4) ◽

pp. 1584-1593 ◽

Cited By ~ 17

Author(s):

Peter A. DiMaggio, Jr. ◽

Christodoulos A. Floudas ◽

Bingwen Lu ◽

John R. Yates, III

Keyword(s):

Mass Spectrometry ◽

Tandem Mass Spectrometry ◽

Hybrid Method ◽

Linear Optimization ◽

Time Of Flight ◽

Peptide Identification ◽

Database Search ◽

Tandem Mass ◽

Local Database ◽

Integer Linear Optimization

Download Full-text

PIPI: PTM-Invariant Peptide Identification Using Coding Method

10.1101/055806 ◽

2016 ◽

Cited By ~ 1

Author(s):

Fengchao Yu ◽

Ning Li ◽

Weichuan Yu

Keyword(s):

Amino Acids ◽

Dynamic Programming Algorithm ◽

Computational Cost ◽

Peptide Identification ◽

Real Data ◽

Search Space ◽

Database Search ◽

Programming Algorithm ◽

Post Translational Modification ◽

Coding Method

AbstractIn computational proteomics, identification of peptides with an unlimited number of post-translational modification (PTM) types is a challenging task. The computational cost increases exponentially with respect to the number of modifiable amino acids and linearly with respect to the number of potential PTM types at each amino acid. The problem becomes intractable very quickly if we want to enumerate all possible modification patterns. Existing tools (e.g., MS-Alignment, ProteinProspector, and MODa) avoid enumerating modification patterns in database search by using an alignment-based approach to localize and characterize modified amino acids. This approach avoids enumerating all possible modification patterns in a database search. However, due to the large search space and PTM localization issue, the sensitivity of these tools is low. This paper proposes a novel method named PIPI to achieve PTM-invariant peptide identification. PIPI first codes peptide sequences into Boolean vectors and converts experimental spectra into real-valued vectors. Then, it finds the top 10 peptide-coded vectors for each spectrum-coded vector. After that, PIPI uses a dynamic programming algorithm to localize and characterize modified amino acids. Simulations and real data experiments have shown that PIPI outperforms existing tools by identifying more peptide-spectrum matches (PSMs) and reporting fewer false positives. It also runs much faster than existing tools when the database is large.

Download Full-text

Peptide Identification by Database Search of Mixture Tandem Mass Spectra

Molecular & Cellular Proteomics ◽

10.1074/mcp.m111.010017 ◽

2011 ◽

Vol 10 (12) ◽

pp. M111.010017 ◽

Cited By ~ 20

Author(s):

Jian Wang ◽

Philip E. Bourne ◽

Nuno Bandeira

Keyword(s):

Mass Spectra ◽

Peptide Identification ◽

Database Search ◽

Tandem Mass ◽

Tandem Mass Spectra

Download Full-text

DIA-Pipe: Identification and Quantification of Post-Translational Modifications using exclusively Data-Independent Acquisition

10.1101/141382 ◽

2017 ◽

Author(s):

Jesse G. Meyer ◽

Sushanth Mukkamalla ◽

Alexandria K. D’Souza ◽

Alexey I. Nesvizhskii ◽

Bradford W. Gibson ◽

...

Keyword(s):

Peptide Identification ◽

Software Tool ◽

Label Free ◽

Spectral Library ◽

Automated Identification ◽

Post Translational Modifications ◽

Data Independent Acquisition ◽

Sample Amount ◽

Using Data ◽

Identification And Quantification

Label-free quantification using data-independent acquisition (DIA) is a robust method for deep and accurate proteome quantification1,2. However, when lacking a pre-existing spectral library, as is often the case with studies of novel post-translational modifications (PTMs), samples are typically analyzed several times: one or more data dependent acquisitions (DDA) are used to generate a spectral library followed by DIA for quantification. This type of multi-injection analysis results in significant cost with regard to sample consumption and instrument time for each new PTM study, and may not be possible when sample amount is limiting and/or studies require a large number of biological replicates. Recently developed software (e.g. DIA-Umpire) has enabled combined peptide identification and quantification from a data-independent acquisition without any pre-existing spectral library3,4. Still, these tools are designed for protein level quantification. Here we demonstrate a software tool and workflow that extends DIA-Umpire to allow automated identification and quantification of PTM peptides from DIA. We accomplish this using a custom, open-source graphical user interface DIA-Pipe (https://github.com/jgmeyerucsd/PIQEDia/releases/tag/v0.1.2) (figure 1a).

Download Full-text