scholarly journals SLM-Transform: A Method for Memory-Efficient Indexing of Spectra for Database Search in LC-MS/MS Proteomics

2019 ◽  
Author(s):  
Muhammad Haseeb ◽  
Muaaz G. Awan ◽  
Alexander S. Cadigan ◽  
Fahad Saeed

AbstractThe most commonly used strategy for peptide identification in shotgun LC-MS/MS proteomics involves searching of MS/MS data against an in-silico digested protein sequence database. Typically, the digested peptide sequences are indexed into the memory to allow faster search times. However, subjecting a database to post-translational modifications (PTMs) during digestion results in an exponential increase in the number of peptides and therefore memory consumption. This limits the usage of existing fragment-ion based open-search algorithms for databases with several PTMs. In this paper, we propose a novel fragment-ion indexing technique which is analogous to suffix array transformation and allows constant time querying of indexed ions. We extend our transformation method, called SLM-Transform, by constructing ion buckets that allow querying of all indexed ions by mass by only storing information on distribution of ion-frequencies within buckets. The stored information is used with a regression technique to locate the position of ions in constant time. Moreover, the number of theoretical b- and y-ions generated and indexed for each theoretical spectrum are limited. Our results show that SLM-Transform allows indexing of up to 4x peptides than other leading fragment-ion based database search tools within the same memory constraints. We show that SLM-Transform based index allows indexing of over 83 million peptides within 26GB RAM as compared to 80GB required by MSFragger. Finally, we show the constant ion retrieval time for SLM-Transform based index allowing ultrafast peptide search speeds.Source code will be made available at: https://github.com/pcdslab/slmindex

2020 ◽  
Author(s):  
John T. Halloran ◽  
Gregor Urban ◽  
David Rocke ◽  
Pierre Baldi

AbstractSemi-supervised machine learning post-processors critically improve peptide identification of shot-gun proteomics data. Such post-processors accept the peptide-spectrum matches (PSMs) and feature vectors resulting from a database search, train a machine learning classifier, and recalibrate PSMs using the trained parameters, often yielding significantly more identified peptides across q-value thresholds. However, current state-of-the-art post-processors rely on shallow machine learning methods, such as support vector machines. In contrast, the powerful training capabilities of deep learning models have displayed superior performance to shallow models in an ever-growing number of other fields. In this work, we show that deep models significantly improve the recalibration of PSMs compared to the most accurate and widely-used post-processors, such as Percolator and PeptideProphet. Furthermore, we show that deep learning is able to adaptively analyze complex datasets and features for more accurate universal post-processing, leading to both improved Prosit analysis and markedly better recalibration of recently developed database-search functions.


2014 ◽  
Vol 13 (12) ◽  
pp. 3663-3673 ◽  
Author(s):  
Xusheng Wang ◽  
Yuxin Li ◽  
Zhiping Wu ◽  
Hong Wang ◽  
Haiyan Tan ◽  
...  

2016 ◽  
Author(s):  
Fengchao Yu ◽  
Ning Li ◽  
Weichuan Yu

AbstractIn computational proteomics, identification of peptides with an unlimited number of post-translational modification (PTM) types is a challenging task. The computational cost increases exponentially with respect to the number of modifiable amino acids and linearly with respect to the number of potential PTM types at each amino acid. The problem becomes intractable very quickly if we want to enumerate all possible modification patterns. Existing tools (e.g., MS-Alignment, ProteinProspector, and MODa) avoid enumerating modification patterns in database search by using an alignment-based approach to localize and characterize modified amino acids. This approach avoids enumerating all possible modification patterns in a database search. However, due to the large search space and PTM localization issue, the sensitivity of these tools is low. This paper proposes a novel method named PIPI to achieve PTM-invariant peptide identification. PIPI first codes peptide sequences into Boolean vectors and converts experimental spectra into real-valued vectors. Then, it finds the top 10 peptide-coded vectors for each spectrum-coded vector. After that, PIPI uses a dynamic programming algorithm to localize and characterize modified amino acids. Simulations and real data experiments have shown that PIPI outperforms existing tools by identifying more peptide-spectrum matches (PSMs) and reporting fewer false positives. It also runs much faster than existing tools when the database is large.


2011 ◽  
Vol 10 (12) ◽  
pp. M111.010017 ◽  
Author(s):  
Jian Wang ◽  
Philip E. Bourne ◽  
Nuno Bandeira

2017 ◽  
Author(s):  
Jesse G. Meyer ◽  
Sushanth Mukkamalla ◽  
Alexandria K. D’Souza ◽  
Alexey I. Nesvizhskii ◽  
Bradford W. Gibson ◽  
...  

Label-free quantification using data-independent acquisition (DIA) is a robust method for deep and accurate proteome quantification1,2. However, when lacking a pre-existing spectral library, as is often the case with studies of novel post-translational modifications (PTMs), samples are typically analyzed several times: one or more data dependent acquisitions (DDA) are used to generate a spectral library followed by DIA for quantification. This type of multi-injection analysis results in significant cost with regard to sample consumption and instrument time for each new PTM study, and may not be possible when sample amount is limiting and/or studies require a large number of biological replicates. Recently developed software (e.g. DIA-Umpire) has enabled combined peptide identification and quantification from a data-independent acquisition without any pre-existing spectral library3,4. Still, these tools are designed for protein level quantification. Here we demonstrate a software tool and workflow that extends DIA-Umpire to allow automated identification and quantification of PTM peptides from DIA. We accomplish this using a custom, open-source graphical user interface DIA-Pipe (https://github.com/jgmeyerucsd/PIQEDia/releases/tag/v0.1.2) (figure 1a).


2016 ◽  
Author(s):  
Qifeng Gan ◽  
Lama Seoud ◽  
Houssem Ben Tahar ◽  
J.M. Pierre Langlois

Sign in / Sign up

Export Citation Format

Share Document