A sectioning and database enrichment approach for improved peptide spectrum matching in large, genome-guided protein sequence databases

Mapping Intimacies ◽

10.1101/843078 ◽

2019 ◽

Author(s):

Praveen Kumar ◽

James E. Johnson ◽

Caleb Easterly ◽

Subina Mehta ◽

Ray Sajulga ◽

...

Keyword(s):

Mass Spectrometry ◽

Protein Sequence ◽

Sequence Database ◽

Sequencing Data ◽

Proteomics Data ◽

Step Method ◽

Protein Sequence Database ◽

Sectioning Method ◽

Wide Range ◽

Sequence Databases

AbstractMulti-omics approaches focused on mass-spectrometry (MS)-based data, such as metaproteomics, utilize genomic and/or transcriptomic sequencing data to generate a comprehensive protein sequence database. These databases can be very large, containing millions of sequences, which reduces the sensitivity of matching tandem mass spectrometry (MS/MS) data to sequences to generate peptide spectrum matches (PSMs). Here, we describe a sectioning method for generating an enriched database for those protein sequences that are most likely present in the sample. Our evaluation demonstrates how this method helps to increase the sensitivity of PSMs while maintaining acceptable false discovery rate statistics. We demonstrate increased true positive PSM identifications using the sectioning method when compared to the traditional large database searching method, whereas it helped in reducing the false PSM identifications when compared to a previously described two-step method for reducing database size. The sectioning method for large sequence databases enables generation of an enriched protein sequence database and promotes increased sensitivity in identifying PSMs, while maintaining acceptable and manageable FDR. Furthermore, implementation in the Galaxy platform provides access to a usable and automated workflow for carrying out the method. Our results show the utility of this methodology for a wide-range of applications where genome-guided, large sequence databases are required for MS-based proteomics data analysis.

Download Full-text

MScDB: A Mass Spectrometry-centric Protein Sequence Database for Proteomics

Journal of Proteome Research ◽

10.1021/pr400215r ◽

2013 ◽

Vol 12 (6) ◽

pp. 2386-2398 ◽

Cited By ~ 9

Author(s):

Harald Marx ◽

Simone Lemeer ◽

Susan Klaeger ◽

Thomas Rattei ◽

Bernhard Kuster

Keyword(s):

Mass Spectrometry ◽

Protein Sequence ◽

Sequence Database ◽

Protein Sequence Database

Download Full-text

Selection of Tree Nut Allergen Peptide Markers: A Need for Improved Protein Sequence Databases

Journal of AOAC International ◽

10.1093/jaoac/102.5.1263 ◽

2019 ◽

Vol 102 (5) ◽

pp. 1263-1270 ◽

Cited By ~ 1

Author(s):

Weili Xiong ◽

Melinda A McFarland ◽

Cary Pirone ◽

Christine H Parker

Keyword(s):

Food Allergen ◽

Protein Sequence ◽

Sequence Information ◽

Sequencing Data ◽

Reference Tree ◽

Candidate Peptide ◽

Tree Nut ◽

Allergen Detection ◽

Sequence Databases ◽

Selection Of

Abstract Background: To effectively safeguard the food-allergic population and support compliance with food-labeling regulations, the food industry and regulatory agencies require reliable methods for food allergen detection and quantification. MS-based detection of food allergens relies on the systematic identification of robust and selective target peptide markers. The selection of proteotypic peptide markers, however, relies on the availability of high-quality protein sequence information, a bottleneck for the analysis of many plant-based proteomes. Method: In this work, data were compiled for reference tree nut ingredients and evaluated using a parsimony-driven global proteomics workflow. Results: The utility of supplementing existing incomplete protein sequence databases with translated genomic sequencing data was evaluated for English walnut and provided enhanced selection of candidate peptide markers and differentiation between closely related species. Highlights: Future improvements of protein databases and release of genomics-derived sequences are expected to facilitate the development of robust and harmonized LC–tandem MS-based methods for food allergen detection.

Download Full-text

[3] PIR-International protein sequence database

Methods in Enzymology - Computer Methods for Macromolecular Sequence Analysis ◽

10.1016/s0076-6879(96)66005-4 ◽

1996 ◽

pp. 41-59 ◽

Cited By ~ 5

Author(s):

David G. George ◽

Lois T. Hunt ◽

Winona C. Barker

Keyword(s):

Protein Sequence ◽

Sequence Database ◽

Protein Sequence Database

Download Full-text

Identification of 2-D Gel Proteins at the Femtomole Level by Molecular Mass Searching of Peptide Fragments in a Protein Sequence Database

Techniques in Protein Chemistry ◽

10.1016/b978-0-12-194710-1.50006-0 ◽

1994 ◽

pp. 3-9 ◽

Cited By ~ 3

Author(s):

William J. Henzel ◽

Todd M. Billeci ◽

John T. Stults ◽

Susan C. Wong ◽

Christopher Grimley ◽

...

Keyword(s):

Molecular Mass ◽

Protein Sequence ◽

Sequence Database ◽

Peptide Fragments ◽

Protein Sequence Database

Download Full-text

Protein Sequence Database Methods

Genetic Engineering: Principles and Methods ◽

10.1007/978-0-306-48573-2_2 ◽

2004 ◽

pp. 13-17

Author(s):

Maria Jesus Martin ◽

Claire O’Donovan ◽

Rolf Apweiler

Keyword(s):

Protein Sequence ◽

Sequence Database ◽

Protein Sequence Database

Download Full-text

The PIR-International Protein Sequence Database

Nucleic Acids Research ◽

10.1093/nar/20.suppl.2023 ◽

1992 ◽

Vol 20 (suppl) ◽

pp. 2023-2026 ◽

Cited By ~ 19

Author(s):

W. C. Barker ◽

D. G. George ◽

H.-W. Mewes ◽

A. Tsugita

Keyword(s):

Protein Sequence ◽

Sequence Database ◽

Protein Sequence Database

Download Full-text

Enhanced sequence identification technique for protein sequence database mining with hybrid frequent pattern mining algorithm

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2016.10001625 ◽

2016 ◽

Vol 16 (3) ◽

pp. 205

Author(s):

J. Jeyabharathi ◽

D. Shanthi

Keyword(s):

Protein Sequence ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Database Mining ◽

Sequence Database ◽

Protein Sequence Database ◽

Mining Algorithm ◽

Sequence Identification ◽

Identification Technique

Download Full-text

Automatic Synchronizing and self-Updating Protein Sequence Database Management System

10.1240/sav_gbm_2004_h_000701 ◽

2004 ◽

Vol 2004 (Fall) ◽

Author(s):

Andreas B�hm ◽

Albert Sickmann

Keyword(s):

Protein Sequence ◽

Management System ◽

Database Management ◽

Database Management System ◽

Sequence Database ◽

Protein Sequence Database

Download Full-text

EMBOPRO—an automatically generated protein sequence database

Bioinformatics ◽

10.1093/bioinformatics/5.1.15 ◽

1989 ◽

Vol 5 (1) ◽

pp. 15-18

Author(s):

R. Stulich ◽

K. Rohde

Keyword(s):

Protein Sequence ◽

Sequence Database ◽

Protein Sequence Database

Download Full-text

Enhanced sequence identification technique for protein sequence database mining with hybrid frequent pattern mining algorithm

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2016.080673 ◽

2016 ◽

Vol 16 (3) ◽

pp. 205 ◽

Cited By ~ 1

Author(s):

J. Jeyabharathi ◽

D. Shanthi

Keyword(s):

Protein Sequence ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Database Mining ◽

Sequence Database ◽

Protein Sequence Database ◽

Mining Algorithm ◽

Sequence Identification ◽

Identification Technique

Download Full-text