Survey of the camel urinary proteome by shotgun proteomics using a multiple database search strategy

Abdulqader A. Alhaider; Nervana Bayoumy; Evelyn Argo; Abdel G. M. A. Gader; David A. Stead

doi:10.1002/pmic.201100631

Deep Semi-Supervised Learning Improves Universal Peptide Identification of Shotgun Proteomics Data

10.1101/2020.11.12.380881 ◽

2020 ◽

Author(s):

John T. Halloran ◽

Gregor Urban ◽

David Rocke ◽

Pierre Baldi

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Peptide Identification ◽

Shotgun Proteomics ◽

Database Search ◽

Supervised Machine Learning ◽

Superior Performance ◽

Support Vector ◽

Proteomics Data ◽

Learning Classifier

AbstractSemi-supervised machine learning post-processors critically improve peptide identification of shot-gun proteomics data. Such post-processors accept the peptide-spectrum matches (PSMs) and feature vectors resulting from a database search, train a machine learning classifier, and recalibrate PSMs using the trained parameters, often yielding significantly more identified peptides across q-value thresholds. However, current state-of-the-art post-processors rely on shallow machine learning methods, such as support vector machines. In contrast, the powerful training capabilities of deep learning models have displayed superior performance to shallow models in an ever-growing number of other fields. In this work, we show that deep models significantly improve the recalibration of PSMs compared to the most accurate and widely-used post-processors, such as Percolator and PeptideProphet. Furthermore, we show that deep learning is able to adaptively analyze complex datasets and features for more accurate universal post-processing, leading to both improved Prosit analysis and markedly better recalibration of recently developed database-search functions.

Download Full-text

Efficient identification of patients eligible for clinical studies using case-based reasoning on Scottish Health Research Register (SHARE).

10.21203/rs.2.18160/v1 ◽

2019 ◽

Author(s):

Wen Shi ◽

Tom Kelsey ◽

Frank Sullivan

Keyword(s):

Health Research ◽

Clinical Studies ◽

Search Strategy ◽

Database Search ◽

Prediction Score ◽

Free Text ◽

Case Based Reasoning ◽

Study Participation ◽

Research Register ◽

Case Based

Abstract Background: Trials often struggle to achieve their target sample size with only half doing so. Some researchers have turned to Electronic Health Records (EHRs), seeking a more efficient way of recruitment. The Scottish Health Research Register (SHARE) obtained patients’ consent for their EHRs to be used as a searching base from which researchers can find potential participants. However, due to the fact that EHR data is not complete, sufficient or accurate, a database search strategy may not generate the best case-finding result. The current study aims to evaluate the performance of a case-based reasoning method in identifying participants for population-based clinical studies recruiting through SHARE, and assess the difference between its resultant cohort and the original one deriving from searching EHRs.Methods: A case-based reasoning framework was applied to 119 participants in nine projects using two-fold cross-validation, with records from a further 86,292 individuals used for testing. A prediction score for study participation was derived from the diagnosis, procedure, pharmaceutical prescription, and laboratory test results attributes of each participant. Evaluation was conducted by calculating Area Under the ROC Curve and information retrieval metrics for the ranking list of the test set by prediction score. We compared the most likely participants as identified by searching a database to those ranked highest by our model. Results: The average ROCAUC for nine projects was 81% indicating strong predictive ability for these data. However, the derived ranking lists showed lower predictive performance, with only 21% of the persons ranked within top 50 positions being the same as identified by searching databases.Conclusions: Case-based reasoning is may be more effective than a database search strategy for participant identification for clinical studies using population EHRs. The lower performance of ranking lists derived from case-based reasoning means that patients identified as highly suitable for study participation may still not be recruited. This suggests that further study is needed into improvements in the collection and curation of population EHRs such as use of free text data to better define the characteristics of people more likely to be recruited.

Download Full-text

Tailor: A Nonparametric and Rapid Score Calibration Method for Database Search-Based Peptide Identification in Shotgun Proteomics

Journal of Proteome Research ◽

10.1021/acs.jproteome.9b00736 ◽

2020 ◽

Vol 19 (4) ◽

pp. 1481-1490

Author(s):

Pavel Sulimov ◽

Attila Kertész-Farkas

Keyword(s):

Peptide Identification ◽

Shotgun Proteomics ◽

Calibration Method ◽

Database Search

Download Full-text

A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides

Nature Biotechnology ◽

10.1038/nbt.3267 ◽

2015 ◽

Vol 33 (7) ◽

pp. 743-749 ◽

Cited By ~ 223

Author(s):

Joel M Chick ◽

Deepak Kolippakkam ◽

David P Nusinow ◽

Bo Zhai ◽

Ramin Rad ◽

...

Keyword(s):

Shotgun Proteomics ◽

Database Search ◽

Modified Peptides

Download Full-text

Combining high resolution and exact calibration to boost statistical power: A well-calibrated score function for high-resolution MS2 data

10.1101/290858 ◽

2018 ◽

Author(s):

Andy Lin ◽

J. Jeffry Howbert ◽

William Stafford Noble

Keyword(s):

Mass Spectrometry ◽

High Resolution ◽

Statistical Power ◽

State Of The Art ◽

Score Function ◽

Shotgun Proteomics ◽

Database Search ◽

Mass Spectrometry Data ◽

P Value ◽

Score Functions

AbstractTo achieve accurate assignment of peptide sequences to observed fragmentation spectra, a shotgun proteomics database search tool must make good use of the very high resolution information produced by state-of-the-art mass spectrometers. However, making use of this information while also ensuring that the search engine’s scores are well calibrated—i.e., that the score assigned to one spectrum can be meaningfully compared to the score assigned to a different spectrum—has proven to be challenging. Here, we describe a database search score function, the “residue evidence” (res-ev) score, that achieves both of these goals simultaneously. We also demonstrate how to combine calibrated res-ev scores with calibrated XCorr scores to produce a “combined p-value” score function. We provide a benchmark consisting of four mass spectrometry data sets, which we use to compare the combined p-value to the score functions used by several existing search engines. Our results suggest that the combined p-value achieves state-of-the-art performance, generally outperforming MS Amanda and Morpheus and performing comparably to MS-GF+. The res-ev and combined p-value score functions are freely available as part of the Tide search engine in the Crux mass spectrometry toolkit (http://crux.ms).

Download Full-text

An averaging strategy to reduce variability in target-decoy estimates of false discovery rate

10.1101/440594 ◽

2018 ◽

Cited By ~ 1

Author(s):

Uri Keich ◽

Kaipo Tamura ◽

William Stafford Noble

Keyword(s):

False Discovery Rate ◽

Statistical Power ◽

Shotgun Proteomics ◽

Database Search ◽

Proteomics Data ◽

Decoy Database ◽

Software Toolkit ◽

True Proportion ◽

False Discovery ◽

False Discoveries

AbstractDecoy database search with target-decoy competition (TDC) provides an intuitive, easy-to-implement method for estimating the false discovery rate (FDR) associated with spectrum identifications from shotgun proteomics data. However, the procedure can yield different results for a fixed dataset analyzed with different decoy databases, and this decoy-induced variability is particularly problematic for smaller FDR thresholds, datasets or databases. In such cases, the nominal FDR might be 1% but the true proportion of false discoveries might be 10%. The averaged TDC protocol combats this problem by exploiting multiple independently shuffled decoy databases to provide an FDR estimate with reduced variability. We provide a tutorial introduction to aTDC, describe an improved variant of the protocol that offers increased statistical power, and discuss how to deploy aTDC in practice using the Crux software toolkit.

Download Full-text

Limitation of predictive 2-D liquid chromatography in reducing the database search space in shotgun proteomics: In silico studies

Journal of Separation Science ◽

10.1002/jssc.201100798 ◽

2012 ◽

Vol 35 (14) ◽

pp. 1771-1778 ◽

Cited By ~ 4

Author(s):

Eugene Moskovets ◽

Anton A. Goloborodko ◽

Alexander V. Gorshkov ◽

Mikhail V. Gorshkov

Keyword(s):

Liquid Chromatography ◽

In Silico ◽

Shotgun Proteomics ◽

Search Space ◽

Database Search ◽

In Silico Studies

Download Full-text

An efficient database search strategy for audio fingerprinting

2002 IEEE Workshop on Multimedia Signal Processing. ◽

10.1109/mmsp.2002.1203276 ◽

2004 ◽

Cited By ~ 8

Author(s):

J. Haitsma ◽

T. Kalker ◽

J. Oostveen

Keyword(s):

Search Strategy ◽

Database Search ◽

Audio Fingerprinting

Download Full-text

Efficient identification of patients eligible for clinical studies using case-based reasoning on Scottish Health Research Register (SHARE).

10.21203/rs.2.18160/v2 ◽

2020 ◽

Author(s):

Wen Shi ◽

Tom Kelsey ◽

Frank Sullivan

Keyword(s):

Health Research ◽

Clinical Studies ◽

Search Strategy ◽

Database Search ◽

Prediction Score ◽

Free Text ◽

Case Based Reasoning ◽

Study Participation ◽

Research Register ◽

Case Based

Abstract Background: Trials often struggle to achieve their target sample size with only half doing so. Some researchers have turned to Electronic Health Records (EHRs), seeking a more efficient way of recruitment. The Scottish Health Research Register (SHARE) obtained patients’ consent for their EHRs to be used as a searching base from which researchers can find potential participants. However, due to the fact that EHR data is not complete, sufficient or accurate, a database search strategy may not generate the best case-finding result. The current study aims to evaluate the performance of a case-based reasoning method in identifying participants for population-based clinical studies recruiting through SHARE, and assess the difference between its resultant cohort and the original one deriving from searching EHRs. Methods: A case-based reasoning framework was applied to 119 participants in nine projects using two-fold cross-validation, with records from a further 86,292 individuals used for testing. A prediction score for study participation was derived from the diagnosis, procedure, pharmaceutical prescription, and laboratory test results attributes of each participant. Evaluation was conducted by calculating Area Under the ROC Curve and information retrieval metrics for the ranking list of the test set by prediction score. We compared the most likely participants as identified by searching a database to those ranked highest by our model. Results: The average ROCAUC for nine projects was 81% indicating strong predictive ability for these data. However, the derived ranking lists showed lower predictive performance, with only 21% of the persons ranked within top 50 positions being the same as identified by searching databases. Conclusions: Case-based reasoning is may be more effective than a database search strategy for participant identification for clinical studies using population EHRs. The lower performance of ranking lists derived from case-based reasoning means that patients identified as highly suitable for study participation may still not be recruited. This suggests that further study is needed into improvements in the collection and curation of population EHRs, such as use of free text data to aid reliable identification of people more likely to be recruited to clinical trials.

Download Full-text

Surgical and interventional radiological management of adult epistaxis: systematic review

The Journal of Laryngology & Otology ◽

10.1017/s0022215117002079 ◽

2017 ◽

Vol 131 (12) ◽

pp. 1108-1130 ◽

Cited By ~ 3

Author(s):

C Swords ◽

A Patel ◽

M E Smith ◽

R J Williams ◽

I Kuhn ◽

...

Keyword(s):

Systematic Review ◽

Adverse Effect ◽

Interventional Radiology ◽

Search Strategy ◽

Database Search ◽

Economic Effects ◽

Success Rates ◽

Review Of The Literature ◽

Artery Ligation ◽

Sphenopalatine Artery

AbstractBackground:There is variation regarding the use of surgery and interventional radiological techniques in the management of epistaxis. This review evaluates the effectiveness of surgical artery ligation compared to direct treatments (nasal packing, cautery), and that of embolisation compared to direct treatments and surgery.Method:A systematic review of the literature was performed using a standardised published methodology and custom database search strategy.Results:Thirty-seven studies were identified relating to surgery, and 34 articles relating to interventional radiology. For patients with refractory epistaxis, endoscopic sphenopalatine artery ligation had the most favourable adverse effect profile and success rate compared to other forms of surgical artery ligation. Endoscopic sphenopalatine artery ligation and embolisation had similar success rates (73–100 per cent and 75–92 per cent, respectively), although embolisation was associated with more serious adverse effects (risk of stroke, 1.1–1.5 per cent). No articles directly compared the two techniques.Conclusion:Trials comparing endoscopic sphenopalatine artery ligation to embolisation are required to better evaluate the clinical and economic effects of intervention in epistaxis.

Download Full-text