IdentiPy: An Extensible Search Engine for Protein Identification in Shotgun Proteomics

Lev I. Levitsky; Mark V. Ivanov; Anna A. Lobas; Julia A. Bubis; Irina A. Tarasova; Elizaveta M. Solovyeva; Marina L. Pridatchenko; Mikhail V. Gorshkov

doi:10.1021/acs.jproteome.7b00640

A comprehensive and scalable database search system for metaproteomics

10.1101/053975 ◽

2016 ◽

Author(s):

Sandip Chatterjee ◽

Gregory S. Stupp ◽

Sung Kyu (Robin) Park ◽

Jean-Christophe Ducom ◽

John R. Yates ◽

...

Keyword(s):

Search Engine ◽

Protein Identification ◽

High Throughput Sequencing ◽

Shotgun Proteomics ◽

Identification Accuracy ◽

Sequencing Data ◽

Protein Database ◽

Healthy Human ◽

Genomic Libraries ◽

Sequence Databases

AbstractBackgroundMass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations.ResultsOur approach to overcome this significant limitation in metaproteomics was to design a scalable set of sequence databases assembled for optimal library querying speeds. ComPIL was integrated with a modified version of the search engine ProLuCID (termed “Blazmass”) to permit rapid matching of experimental spectra. Proof-of-principle analysis of human HEK293 lysate with a ComPIL database derived from high-quality genomic libraries was able to detect nearly all of the same peptides as a search with a human database (~500x fewer peptides in the database), with a small reduction in sensitivity. We were also able to detect proteins from the adenovirus used to immortalize these cells. We applied our method to a set of healthy human gut microbiome proteomic samples and showed a substantial increase in the number of identified peptides and proteins compared to previous metaproteomic analyses, while retaining a high degree of protein identification accuracy, and allowing for a more in-depth characterization of the functional landscape of the samples.ConclusionsThe combination of ComPIL with Blazmass allows proteomic searches to be performed with database sizes much larger than previously possible. These large database searches can be applied to complex meta-samples with unknown composition or proteomic samples where unexpected proteins may be identified. The protein database, proteomics search engine, and the proteomic data files for the 5 microbiome samples characterized and discussed herein are open source and available for use and additional analysis.

Download Full-text

Optimizing Shotgun Proteomics Analysis for a Confident Protein Identification and Quantitation in Orphan Plant Species: The Case of Holm Oak (Quercus ilex)

Methods in Molecular Biology - Plant Proteomics ◽

10.1007/978-1-0716-0528-8_12 ◽

2020 ◽

pp. 157-168

Author(s):

Isabel Gómez-Gálvez ◽

Rosa Sánchez-Lucas ◽

Bonoso San-Eufrasio ◽

Luis Enrique Rodríguez de Francisco ◽

Ana M. Maldonado-Alconada ◽

...

Keyword(s):

Plant Species ◽

Protein Identification ◽

Quercus Ilex ◽

Shotgun Proteomics ◽

Holm Oak ◽

Proteomics Analysis ◽

Protein Identification And Quantitation

Download Full-text

Putative Antimicrobial Peptides of the Posterior Salivary Glands from the Cephalopod Octopus vulgaris Revealed by Exploring a Composite Protein Database

Antibiotics ◽

10.3390/antibiotics9110757 ◽

2020 ◽

Vol 9 (11) ◽

pp. 757 ◽

Cited By ~ 1

Author(s):

Daniela Almeida ◽

Dany Domínguez-Pérez ◽

Ana Matos ◽

Guillermin Agüero-Chapin ◽

Hugo Osório ◽

...

Keyword(s):

Antimicrobial Peptides ◽

Salivary Glands ◽

Protein Identification ◽

Inflammatory Responses ◽

Shotgun Proteomics ◽

Octopus Vulgaris ◽

Protein Database ◽

Venom Protein ◽

Protein Toxin ◽

Proteomics Approach

Cephalopods, successful predators, can use a mixture of substances to subdue their prey, becoming interesting sources of bioactive compounds. In addition to neurotoxins and enzymes, the presence of antimicrobial compounds has been reported. Recently, the transcriptome and the whole proteome of the Octopus vulgaris salivary apparatus were released, but the role of some compounds—e.g., histones, antimicrobial peptides (AMPs), and toxins—remains unclear. Herein, we profiled the proteome of the posterior salivary glands (PSGs) of O. vulgaris using two sample preparation protocols combined with a shotgun-proteomics approach. Protein identification was performed against a composite database comprising data from the UniProtKB, all transcriptomes available from the cephalopods’ PSGs, and a comprehensive non-redundant AMPs database. Out of the 10,075 proteins clustered in 1868 protein groups, 90 clusters corresponded to venom protein toxin families. Additionally, we detected putative AMPs clustered with histones previously found as abundant proteins in the saliva of O. vulgaris. Some of these histones, such as H2A and H2B, are involved in systemic inflammatory responses and their antimicrobial effects have been demonstrated. These results not only confirm the production of enzymes and toxins by the O. vulgaris PSGs but also suggest their involvement in the first line of defense against microbes.

Download Full-text

A Novel Algorithm for Validating Peptide Identification from a Shotgun Proteomics Search Engine

Journal of Proteome Research ◽

10.1021/pr300631t ◽

2013 ◽

Vol 12 (3) ◽

pp. 1108-1119 ◽

Cited By ~ 8

Author(s):

Ling Jian ◽

Xinnan Niu ◽

Zhonghang Xia ◽

Parimal Samir ◽

Chiranthani Sumanasekera ◽

...

Keyword(s):

Search Engine ◽

Peptide Identification ◽

Shotgun Proteomics ◽

Novel Algorithm

Download Full-text

An Automated Multidimensional Protein Identification Technology for Shotgun Proteomics

Analytical Chemistry ◽

10.1021/ac010617e ◽

2001 ◽

Vol 73 (23) ◽

pp. 5683-5690 ◽

Cited By ~ 1217

Author(s):

Dirk A. Wolters ◽

Michael P. Washburn ◽

John R. Yates

Keyword(s):

Protein Identification ◽

Shotgun Proteomics ◽

Multidimensional Protein Identification Technology

Download Full-text

Protein identification in polistes dominula in anallergo extract for diagnosis and immunotherapy by shotgun proteomics approach

10.26226/morressier.5acc8ad4d462b8028d89b708 ◽

2018 ◽

Author(s):

Neri Orsi Battaglini

Keyword(s):

Protein Identification ◽

Shotgun Proteomics ◽

Polistes Dominula ◽

Proteomics Approach

Download Full-text

Influence of NanoLC Column and Gradient Length as well as MS/MS Frequency and Sample Complexity on Shotgun Protein Identification of Marine Bacteria

Journal of Molecular Microbiology and Biotechnology ◽

10.1159/000478907 ◽

2017 ◽

Vol 27 (3) ◽

pp. 199-212 ◽

Cited By ~ 1

Author(s):

Lars Wöhlbrand ◽

Ralf Rabus ◽

Bernd Blasius ◽

Christoph Feenders

Keyword(s):

Marine Bacteria ◽

Protein Identification ◽

Shotgun Proteomics ◽

Sample Complexity ◽

Column Length ◽

Peptide Separation ◽

Total Analysis Time ◽

Total Analysis ◽

Nano Liquid Chromatography ◽

Esi Mass Spectrometry

Protein identification by shotgun proteomics, i.e., nano-liquid chromatography (nanoLC) peptide separation online coupled to electrospray ionization (ESI) mass spectrometry (MS)/MS, is the most widely used gel-free approach in proteome research. While the mass spectrometer accounts for mass accuracy and MS/MS frequency, the nanoLC setup and gradient time influence the number of peptides available for MS analysis, which ultimately determine the number of proteins identifiable. Here, we report on the influence of (i) analytical column length (15, 25, or 50 cm) coupled to (ii) the applied gradient length (120, 240, 360, 480, or 600 min), as well as (iii) MS/MS frequency on peptide/protein identification by shotgun proteomics of (iv) 2 marine bacteria. Longer gradients increased the number of peptides/proteins identified as well as the reproducibility of identification. Furthermore, longer analytical columns strictly enlarge the covered proteome complement. Notably, the proteome complement identified with a short column and applying a long gradient is also covered when using longer columns with shorter gradients. Coverage of the proteome complement further increases with higher MS/MS frequency. Compilation of peptide lists of replicate analyses (same gradient length) improves protein identification, while compilation of analyses with different gradient lengths yields a similar or even higher number of proteins using comparable or even less total analysis time.

Download Full-text

Comparison of search engine contributions in protein mass fingerprinting for protein identification

Biotechnology and Bioprocess Engineering ◽

10.1007/bf03028637 ◽

2007 ◽

Vol 12 (2) ◽

pp. 125-130 ◽

Cited By ~ 1

Author(s):

Won-A Joo ◽

Jeong-Bok Lee ◽

Mira Park ◽

Jae-Won Lee ◽

Hyun-Jung Kim ◽

...

Keyword(s):

Search Engine ◽

Protein Identification ◽

Protein Mass

Download Full-text

Preliminary Search Engine for Open Protein Identification

2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies ◽

10.1109/pdcat.2012.112 ◽

2012 ◽

Author(s):

Wenli Zhang ◽

Hao Chi ◽

Yuanzheng Lu ◽

Yuqing Huang ◽

Xiaofang Zhao ◽

...

Keyword(s):

Search Engine ◽

Protein Identification

Download Full-text

Shotgun Proteomics and Biomarker Discovery

Disease Markers ◽

10.1155/2002/505397 ◽

2002 ◽

Vol 18 (2) ◽

pp. 99-105 ◽

Cited By ~ 194

Author(s):

W. Hayes McDonald ◽

John R. Yates

Keyword(s):

Large Scale ◽

Protein Identification ◽

Biomarker Discovery ◽

Dynamic Range ◽

Shotgun Proteomics ◽

Sequence Information ◽

Protein Biomarkers ◽

Post Translational Modifications ◽

Multidimensional Protein Identification Technology ◽

Shotgun Approach

Coupling large-scale sequencing projects with the amino acid sequence information that can be gleaned from tandem mass spectrometry (MS/MS) has made it much easier to analyze complex mixtures of proteins. The limits of this “shotgun” approach, in which the protein mixture is proteolytically digested before separation, can be further expanded by separating the resulting mixture of peptides prior to MS/MS analysis. Both single dimensional high pressure liquid chromatography (LC) and multidimensional LC (LC/LC) can be directly interfaced with the mass spectrometer to allow for automated collection of tremendous quantities of data. While there is no single technique that addresses all proteomic challenges, the shotgun approaches, especially LC/LC-MS/MS-based techniques such as MudPIT (multidimensional protein identification technology), show advantages over gel-based techniques in speed, sensitivity, scope of analysis, and dynamic range. Advances in the ability to quantitate differences between samples and to detect for an array of post-translational modifications allow for the discovery of classes of protein biomarkers that were previously unassailable.

Download Full-text