scholarly journals A sectioning and database enrichment approach for improved peptide spectrum matching in large, genome-guided protein sequence databases

2019 ◽  
Author(s):  
Praveen Kumar ◽  
James E. Johnson ◽  
Caleb Easterly ◽  
Subina Mehta ◽  
Ray Sajulga ◽  
...  

AbstractMulti-omics approaches focused on mass-spectrometry (MS)-based data, such as metaproteomics, utilize genomic and/or transcriptomic sequencing data to generate a comprehensive protein sequence database. These databases can be very large, containing millions of sequences, which reduces the sensitivity of matching tandem mass spectrometry (MS/MS) data to sequences to generate peptide spectrum matches (PSMs). Here, we describe a sectioning method for generating an enriched database for those protein sequences that are most likely present in the sample. Our evaluation demonstrates how this method helps to increase the sensitivity of PSMs while maintaining acceptable false discovery rate statistics. We demonstrate increased true positive PSM identifications using the sectioning method when compared to the traditional large database searching method, whereas it helped in reducing the false PSM identifications when compared to a previously described two-step method for reducing database size. The sectioning method for large sequence databases enables generation of an enriched protein sequence database and promotes increased sensitivity in identifying PSMs, while maintaining acceptable and manageable FDR. Furthermore, implementation in the Galaxy platform provides access to a usable and automated workflow for carrying out the method. Our results show the utility of this methodology for a wide-range of applications where genome-guided, large sequence databases are required for MS-based proteomics data analysis.

2013 ◽  
Vol 12 (6) ◽  
pp. 2386-2398 ◽  
Author(s):  
Harald Marx ◽  
Simone Lemeer ◽  
Susan Klaeger ◽  
Thomas Rattei ◽  
Bernhard Kuster

2019 ◽  
Vol 102 (5) ◽  
pp. 1263-1270 ◽  
Author(s):  
Weili Xiong ◽  
Melinda A McFarland ◽  
Cary Pirone ◽  
Christine H Parker

Abstract Background: To effectively safeguard the food-allergic population and support compliance with food-labeling regulations, the food industry and regulatory agencies require reliable methods for food allergen detection and quantification. MS-based detection of food allergens relies on the systematic identification of robust and selective target peptide markers. The selection of proteotypic peptide markers, however, relies on the availability of high-quality protein sequence information, a bottleneck for the analysis of many plant-based proteomes. Method: In this work, data were compiled for reference tree nut ingredients and evaluated using a parsimony-driven global proteomics workflow. Results: The utility of supplementing existing incomplete protein sequence databases with translated genomic sequencing data was evaluated for English walnut and provided enhanced selection of candidate peptide markers and differentiation between closely related species. Highlights: Future improvements of protein databases and release of genomics-derived sequences are expected to facilitate the development of robust and harmonized LC–tandem MS-based methods for food allergen detection.


Author(s):  
Maria Jesus Martin ◽  
Claire O’Donovan ◽  
Rolf Apweiler

1992 ◽  
Vol 20 (suppl) ◽  
pp. 2023-2026 ◽  
Author(s):  
W. C. Barker ◽  
D. G. George ◽  
H.-W. Mewes ◽  
A. Tsugita

Sign in / Sign up

Export Citation Format

Share Document