A Novel Algorithm for Validating Peptide Identification from a Shotgun Proteomics Search Engine

Ling Jian; Xinnan Niu; Zhonghang Xia; Parimal Samir; Chiranthani Sumanasekera; Zheng Mu; Jennifer L. Jennings; Kristen L. Hoek; Tara Allos; Leigh M. Howard; Kathryn M. Edwards; P. Anthony Weil; Andrew J. Link

doi:10.1021/pr300631t

A Novel Algorithm for Validating Peptide Identification from a Shotgun Proteomics Search Engine

Journal of Proteome Research ◽

10.1021/pr300631t ◽

2013 ◽

Vol 12 (3) ◽

pp. 1108-1119 ◽

Cited By ~ 8

Author(s):

Ling Jian ◽

Xinnan Niu ◽

Zhonghang Xia ◽

Parimal Samir ◽

Chiranthani Sumanasekera ◽

...

Keyword(s):

Search Engine ◽

Peptide Identification ◽

Shotgun Proteomics ◽

Novel Algorithm

Download Full-text

Peptide identification in “shotgun” proteomics using tandem mass spectrometry: Comparison of search engine algorithms

Journal of Analytical Chemistry ◽

10.1134/s1061934815140075 ◽

2015 ◽

Vol 70 (14) ◽

pp. 1614-1619 ◽

Cited By ~ 2

Author(s):

M. V. Ivanov ◽

L. I. Levitsky ◽

A. A. Lobas ◽

I. A. Tarasova ◽

M. L. Pridatchenko ◽

...

Keyword(s):

Mass Spectrometry ◽

Tandem Mass Spectrometry ◽

Search Engine ◽

Peptide Identification ◽

Shotgun Proteomics ◽

Tandem Mass

Download Full-text

Proteogenomics of Malignant Melanoma Cell Lines: The Effect of Stringency of Exome Data Filtering on Variant Peptide Identification in Shotgun Proteomics

Journal of Proteome Research ◽

10.1021/acs.jproteome.7b00841 ◽

2018 ◽

Vol 17 (5) ◽

pp. 1801-1811 ◽

Cited By ~ 8

Author(s):

Anna A. Lobas ◽

Mikhail A. Pyatnitskiy ◽

Alexey L. Chernobrovkin ◽

Irina Y. Ilina ◽

Dmitry S. Karpov ◽

...

Keyword(s):

Malignant Melanoma ◽

Cell Lines ◽

Melanoma Cell ◽

Peptide Identification ◽

Shotgun Proteomics ◽

Data Filtering ◽

Exome Data ◽

Malignant Melanoma Cell ◽

Variant Peptide ◽

Melanoma Cell Lines

Download Full-text

Bayesian Nonparametric Model for the Validation of Peptide Identification in Shotgun Proteomics

Molecular & Cellular Proteomics ◽

10.1074/mcp.m700558-mcp200 ◽

2008 ◽

Vol 8 (3) ◽

pp. 547-557 ◽

Cited By ~ 24

Author(s):

Jiyang Zhang ◽

Jie Ma ◽

Lei Dou ◽

Songfeng Wu ◽

Xiaohong Qian ◽

...

Keyword(s):

Peptide Identification ◽

Shotgun Proteomics ◽

Nonparametric Model ◽

Bayesian Nonparametric

Download Full-text

Deep Semi-Supervised Learning Improves Universal Peptide Identification of Shotgun Proteomics Data

10.1101/2020.11.12.380881 ◽

2020 ◽

Author(s):

John T. Halloran ◽

Gregor Urban ◽

David Rocke ◽

Pierre Baldi

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Peptide Identification ◽

Shotgun Proteomics ◽

Database Search ◽

Supervised Machine Learning ◽

Superior Performance ◽

Support Vector ◽

Proteomics Data ◽

Learning Classifier

AbstractSemi-supervised machine learning post-processors critically improve peptide identification of shot-gun proteomics data. Such post-processors accept the peptide-spectrum matches (PSMs) and feature vectors resulting from a database search, train a machine learning classifier, and recalibrate PSMs using the trained parameters, often yielding significantly more identified peptides across q-value thresholds. However, current state-of-the-art post-processors rely on shallow machine learning methods, such as support vector machines. In contrast, the powerful training capabilities of deep learning models have displayed superior performance to shallow models in an ever-growing number of other fields. In this work, we show that deep models significantly improve the recalibration of PSMs compared to the most accurate and widely-used post-processors, such as Percolator and PeptideProphet. Furthermore, we show that deep learning is able to adaptively analyze complex datasets and features for more accurate universal post-processing, leading to both improved Prosit analysis and markedly better recalibration of recently developed database-search functions.

Download Full-text

PISV: A novel algorithm for peptide identification using spectrum vector

2012 5th International Conference on BioMedical Engineering and Informatics ◽

10.1109/bmei.2012.6513033 ◽

2012 ◽

Author(s):

Zhenhua Yu ◽

Minghui Wang ◽

Ao Li

Keyword(s):

Peptide Identification ◽

Novel Algorithm

Download Full-text

Tailor: A Nonparametric and Rapid Score Calibration Method for Database Search-Based Peptide Identification in Shotgun Proteomics

Journal of Proteome Research ◽

10.1021/acs.jproteome.9b00736 ◽

2020 ◽

Vol 19 (4) ◽

pp. 1481-1490

Author(s):

Pavel Sulimov ◽

Attila Kertész-Farkas

Keyword(s):

Peptide Identification ◽

Shotgun Proteomics ◽

Calibration Method ◽

Database Search

Download Full-text

Using cross-correlation normalized for peptide length to optimize peptide identification in shotgun proteomics

Rapid Communications in Mass Spectrometry ◽

10.1002/rcm.2137 ◽

2005 ◽

Vol 19 (20) ◽

pp. 2983-2985 ◽

Cited By ~ 1

Author(s):

Bing Yang ◽

Wantao Ying ◽

Yan Gong ◽

Yangjun Zhang ◽

Yun Cai ◽

...

Keyword(s):

Cross Correlation ◽

Peptide Identification ◽

Shotgun Proteomics ◽

Peptide Length

Download Full-text

Support Vector Machine Classification of Probability Models and Peptide Features for Improved Peptide Identification from Shotgun Proteomics

Sixth International Conference on Machine Learning and Applications (ICMLA 2007) ◽

10.1109/icmla.2007.17 ◽

2007 ◽

Cited By ~ 1

Author(s):

Bobbie-Jo M. Webb-Robertson ◽

Christopher S. Oehmen ◽

William R. Cannon

Keyword(s):

Support Vector Machine ◽

Peptide Identification ◽

Shotgun Proteomics ◽

Support Vector ◽

Probability Models ◽

Support Vector Machine Classification

Download Full-text

Semi-supervised learning for peptide identification from shotgun proteomics datasets

Nature Methods ◽

10.1038/nmeth1113 ◽

2007 ◽

Vol 4 (11) ◽

pp. 923-925 ◽

Cited By ~ 1082

Author(s):

Lukas Käll ◽

Jesse D Canterbury ◽

Jason Weston ◽

William Stafford Noble ◽

Michael J MacCoss

Keyword(s):

Supervised Learning ◽

Peptide Identification ◽

Shotgun Proteomics

Download Full-text

A comprehensive and scalable database search system for metaproteomics

10.1101/053975 ◽

2016 ◽

Author(s):

Sandip Chatterjee ◽

Gregory S. Stupp ◽

Sung Kyu (Robin) Park ◽

Jean-Christophe Ducom ◽

John R. Yates ◽

...

Keyword(s):

Search Engine ◽

Protein Identification ◽

High Throughput Sequencing ◽

Shotgun Proteomics ◽

Identification Accuracy ◽

Sequencing Data ◽

Protein Database ◽

Healthy Human ◽

Genomic Libraries ◽

Sequence Databases

AbstractBackgroundMass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations.ResultsOur approach to overcome this significant limitation in metaproteomics was to design a scalable set of sequence databases assembled for optimal library querying speeds. ComPIL was integrated with a modified version of the search engine ProLuCID (termed “Blazmass”) to permit rapid matching of experimental spectra. Proof-of-principle analysis of human HEK293 lysate with a ComPIL database derived from high-quality genomic libraries was able to detect nearly all of the same peptides as a search with a human database (~500x fewer peptides in the database), with a small reduction in sensitivity. We were also able to detect proteins from the adenovirus used to immortalize these cells. We applied our method to a set of healthy human gut microbiome proteomic samples and showed a substantial increase in the number of identified peptides and proteins compared to previous metaproteomic analyses, while retaining a high degree of protein identification accuracy, and allowing for a more in-depth characterization of the functional landscape of the samples.ConclusionsThe combination of ComPIL with Blazmass allows proteomic searches to be performed with database sizes much larger than previously possible. These large database searches can be applied to complex meta-samples with unknown composition or proteomic samples where unexpected proteins may be identified. The protein database, proteomics search engine, and the proteomic data files for the 5 microbiome samples characterized and discussed herein are open source and available for use and additional analysis.

Download Full-text