Preliminary Search Engine for Open Protein Identification

A comprehensive and scalable database search system for metaproteomics

10.1101/053975 ◽

2016 ◽

Author(s):

Sandip Chatterjee ◽

Gregory S. Stupp ◽

Sung Kyu (Robin) Park ◽

Jean-Christophe Ducom ◽

John R. Yates ◽

...

Keyword(s):

Search Engine ◽

Protein Identification ◽

High Throughput Sequencing ◽

Shotgun Proteomics ◽

Identification Accuracy ◽

Sequencing Data ◽

Protein Database ◽

Healthy Human ◽

Genomic Libraries ◽

Sequence Databases

AbstractBackgroundMass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations.ResultsOur approach to overcome this significant limitation in metaproteomics was to design a scalable set of sequence databases assembled for optimal library querying speeds. ComPIL was integrated with a modified version of the search engine ProLuCID (termed “Blazmass”) to permit rapid matching of experimental spectra. Proof-of-principle analysis of human HEK293 lysate with a ComPIL database derived from high-quality genomic libraries was able to detect nearly all of the same peptides as a search with a human database (~500x fewer peptides in the database), with a small reduction in sensitivity. We were also able to detect proteins from the adenovirus used to immortalize these cells. We applied our method to a set of healthy human gut microbiome proteomic samples and showed a substantial increase in the number of identified peptides and proteins compared to previous metaproteomic analyses, while retaining a high degree of protein identification accuracy, and allowing for a more in-depth characterization of the functional landscape of the samples.ConclusionsThe combination of ComPIL with Blazmass allows proteomic searches to be performed with database sizes much larger than previously possible. These large database searches can be applied to complex meta-samples with unknown composition or proteomic samples where unexpected proteins may be identified. The protein database, proteomics search engine, and the proteomic data files for the 5 microbiome samples characterized and discussed herein are open source and available for use and additional analysis.

Download Full-text

Comparison of search engine contributions in protein mass fingerprinting for protein identification

Biotechnology and Bioprocess Engineering ◽

10.1007/bf03028637 ◽

2007 ◽

Vol 12 (2) ◽

pp. 125-130 ◽

Cited By ~ 1

Author(s):

Won-A Joo ◽

Jeong-Bok Lee ◽

Mira Park ◽

Jae-Won Lee ◽

Hyun-Jung Kim ◽

...

Keyword(s):

Search Engine ◽

Protein Identification ◽

Protein Mass

Download Full-text

Parallel FPGA Search Engine for Protein Identification

Embedded Multi-Core Systems - Bioinformatics ◽

10.1201/ebk1439814888-c14 ◽

2010 ◽

pp. 313-335

Author(s):

Daniel Coca ◽

Istvan Bogdan ◽

Robert Beynon

Keyword(s):

Search Engine ◽

Protein Identification

Download Full-text

pTop 1.0: A High-Accuracy and High-Efficiency Search Engine for Intact Protein Identification

Analytical Chemistry ◽

10.1021/acs.analchem.5b03963 ◽

2016 ◽

Vol 88 (6) ◽

pp. 3082-3090 ◽

Cited By ~ 33

Author(s):

Rui-Xiang Sun ◽

Lan Luo ◽

Long Wu ◽

Rui-Min Wang ◽

Wen-Feng Zeng ◽

...

Keyword(s):

Search Engine ◽

Protein Identification ◽

High Efficiency ◽

High Accuracy ◽

Intact Protein

Download Full-text

PepTiger: Search Engine for Error-Tolerant Protein Identification from de Novo Sequences

The Open Spectroscopy Journal ◽

10.2174/1874383800701010001 ◽

2007 ◽

Vol 1 (1) ◽

pp. 1-8

Author(s):

Irina Fedulova ◽

Zheng Ouyang ◽

Charles Buck ◽

Xiang Zhang

Keyword(s):

Search Engine ◽

Protein Identification ◽

De Novo

Download Full-text

FPGA Implementation of Database Search Engine for Protein Identification by Peptide Fragment Fingerprinting

Mechatronic Systems and Control (formerly Control and Intelligent Systems) ◽

10.2316/j.2010.216.680-0125 ◽

2010 ◽

Vol 7 (7) ◽

Author(s):

I.A. Bogdán ◽

D. Coca ◽

R. Beynon

Keyword(s):

Search Engine ◽

Protein Identification ◽

Database Search ◽

Fpga Implementation ◽

Peptide Fragment

Download Full-text

IdentiPy: An Extensible Search Engine for Protein Identification in Shotgun Proteomics

Journal of Proteome Research ◽

10.1021/acs.jproteome.7b00640 ◽

2018 ◽

Vol 17 (7) ◽

pp. 2249-2255 ◽

Cited By ~ 17

Author(s):

Lev I. Levitsky ◽

Mark V. Ivanov ◽

Anna A. Lobas ◽

Julia A. Bubis ◽

Irina A. Tarasova ◽

...

Keyword(s):

Search Engine ◽

Protein Identification ◽

Shotgun Proteomics

Download Full-text

Applications in Forensic Proteomics: Protein Identification and Profiling

10.1021/bk-2019-1339 ◽

2019 ◽

Keyword(s):

Protein Identification

Download Full-text

Who owns a personal home page?

Swiss Journal of Psychology ◽

10.1024//1421-0185.62.2.121 ◽

2003 ◽

Vol 62 (2) ◽

pp. 121-129 ◽

Cited By ~ 5

Author(s):

Astrid Schütz ◽

Franz Machilek

Keyword(s):

Search Engine ◽

Home Page ◽

Sampling Strategies ◽

Home Pages ◽

Personal Home Pages ◽

Age And Sex

Research on personal home pages is still rare. Many studies to date are exploratory, and the problem of drawing a sample that reflects the variety of existing home pages has not yet been solved. The present paper discusses sampling strategies and suggests a strategy based on the results retrieved by a search engine. This approach is used to draw a sample of 229 personal home pages that portray private identities. Findings on age and sex of the owners and elements characterizing the sites are reported.

Download Full-text

Announcing the "Find A Pro" Search Engine

PsycEXTRA Dataset ◽

10.1037/e508422010-005 ◽

2008 ◽

Keyword(s):

Search Engine

Download Full-text