scholarly journals A protein standard that emulates homology for the characterization of protein inference algorithms

2017 ◽  
Author(s):  
Matthew The ◽  
Fredrik Edfors ◽  
Yasset Perez-Riverol ◽  
Samuel H. Payne ◽  
Michael R. Hoopmann ◽  
...  

AbstractA natural way to benchmark the performance of an analytical experimental setup is to use samples of known content, and see to what degree one can correctly infer the content of such a sample from the data. For shotgun proteomics, one of the inherent problems of interpreting data is that the measured analytes are peptides and not the actual proteins themselves. As some proteins share proteolytic peptides, there might be more than one possible causative set of proteins resulting in a given set of peptides and there is a need for mechanisms that infer proteins from lists of detected peptides. A weakness of commercially available samples of known content is that they consist of proteins that are deliberately selected for producing tryptic peptides that are unique to a single protein. Unfortunately, such samples do not expose any complications in protein inference. For a realistic benchmark of protein inference procedures, there is, therefore, a need for samples of known content where the present proteins share peptides with known absent proteins. Here, we present such a standard, that is based on E. coli expressed human protein fragments. To illustrate the usage of this standard, we benchmark a set of different protein inference procedures on the data. We observe that inference procedures excluding shared peptides provide more accurate estimates of errors compared to methods that include information from shared peptides, while still giving a reasonable performance in terms of the number of identified proteins. We also demonstrate that using a sample of known protein content without proteins with shared tryptic peptides can give a false sense of accuracy for many protein inference methods.

2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Jesse G. Meyer

In the postgenome era, biologists have sought to measure the complete complement of proteins, termed proteomics. Currently, the most effective method to measure the proteome is with shotgun, or bottom-up, proteomics, in which the proteome is digested into peptides that are identified followed by protein inference. Despite continuous improvements to all steps of the shotgun proteomics workflow, observed proteome coverage is often low; some proteins are identified by a single peptide sequence. Complete proteome sequence coverage would allow comprehensive characterization of RNA splicing variants and all posttranslational modifications, which would drastically improve the accuracy of biological models. There are many reasons for the sequence coverage deficit, but ultimately peptide length determines sequence observability. Peptides that are too short are lost because they match many protein sequences and their true origin is ambiguous. The maximum observable peptide length is determined by several analytical challenges. This paper explores computationally how peptide lengths produced from several common proteome digestion methods limit observable proteome coverage. Iterative proteome cleavage strategies are also explored. These simulations reveal that maximized proteome coverage can be achieved by use of an iterative digestion protocol involving multiple proteases and chemical cleavages that theoretically allow 92.9% proteome coverage.


2018 ◽  
Author(s):  
Julian Uszkoreit ◽  
Yasset Perez-Riverol ◽  
Britta Eggers ◽  
Katrin Marcus ◽  
Martin Eisenacher

AbstractProteomics using LC-MS/MS has become one of the main methods to analyze the proteins in biological samples in high-throughput. But the existing mass spectrometry instruments are still limited with respect to resolution and measurable mass ranges, which is one of the main reasons why shotgun proteomics is the major approach. Here, proteins are digested, which leads to the identification and quantification of peptides instead. While often neglected, the important step of protein inference needs to be conducted to infer from the identified peptides to the actual proteins in the original sample.In this work, we highlight some of the previously published and newly added features of the tool PIA – Protein Inference Algorithms, which helps the user with the protein inference of measured samples. We also highlight the importance of the usage of PSI standard file formats, as PIA is the only current software supporting all available standards used for spectrum identification and protein inference. Additionally, we briefly describe the benefits of working with workflow environments for proteomics analyses and show the new features of the PIA nodes for the KNIME Analytics Platform. Finally, we benchmark PIA against a recently published dataset for isoform detection.PIA is open source and available for download on GitHub (https://github.com/mpc-bioinformatics/pia) or directly via the community extensions inside the KNIME analytics platform.


2018 ◽  
Vol 17 (5) ◽  
pp. 1879-1886 ◽  
Author(s):  
Matthew The ◽  
Fredrik Edfors ◽  
Yasset Perez-Riverol ◽  
Samuel H. Payne ◽  
Michael R. Hoopmann ◽  
...  

2019 ◽  
Author(s):  
Priya Prakash ◽  
Travis Lantz ◽  
Krupal P. Jethava ◽  
Gaurav Chopra

Amyloid plaques found in the brains of Alzheimer’s disease (AD) patients primarily consists of amyloid beta 1-42 (Ab42). Commercially, Ab42 is synthetized using peptide synthesizers. We describe a robust methodology for expression of recombinant human Ab(M1-42) in Rosetta(DE3)pLysS and BL21(DE3)pLysS competent E. coli with refined and rapid analytical purification techniques. The peptide is isolated and purified from the transformed cells using an optimized set-up for reverse-phase HPLC protocol, using commonly available C18 columns, yielding high amounts of peptide (~15-20 mg per 1 L culture) in a short time. The recombinant Ab(M1-42) forms characteristic aggregates similar to synthetic Ab42 aggregates as verified by western blots and atomic force microscopy to warrant future biological use. Our rapid, refined, and robust technique to purify human Ab(M1-42) can be used to synthesize chemical probes for several downstream in vitro and in vivo assays to facilitate AD research.


2018 ◽  
Vol 34 (3) ◽  
pp. 267-278
Author(s):  
Ashraf A. Abd El-Tawab ◽  
Mohamed G. Aggour ◽  
Fatma I. El- Hofy ◽  
Marwa M. Y. El- Mesalami

Microbiology ◽  
2006 ◽  
Vol 152 (7) ◽  
pp. 2129-2135 ◽  
Author(s):  
Taku Oshima ◽  
Francis Biville

Functional characterization of unknown genes is currently a major task in biology. The search for gene function involves a combination of various in silico, in vitro and in vivo approaches. Available knowledge from the study of more than 21 LysR-type regulators in Escherichia coli has facilitated the classification of new members of the family. From sequence similarities and its location on the E. coli chromosome, it is suggested that ygiP encodes a lysR regulator controlling the expression of a neighbouring operon; this operon encodes the two subunits of tartrate dehydratase (TtdA, TtdB) and YgiE, an integral inner-membrane protein possibly involved in tartrate uptake. Expression of tartrate dehydratase, which converts tartrate to oxaloacetate, is required for anaerobic growth on glycerol as carbon source in the presence of tartrate. Here, it has been demonstrated that disruption of ygiP, ttdA or ygjE abolishes tartrate-dependent anaerobic growth on glycerol. It has also been shown that tartrate-dependent induction of the ttdA-ttdB-ygjE operon requires a functional YgiP.


Author(s):  
Fatma Ben Abid ◽  
Clement K. M. Tsui ◽  
Yohei Doi ◽  
Anand Deshmukh ◽  
Christi L. McElheny ◽  
...  

AbstractOne hundred forty-nine carbapenem-resistant Enterobacterales from clinical samples obtained between April 2014 and November 2017 were subjected to whole genome sequencing and multi-locus sequence typing. Klebsiella pneumoniae (81, 54.4%) and Escherichia coli (38, 25.5%) were the most common species. Genes encoding metallo-β-lactamases were detected in 68 (45.8%) isolates, and OXA-48-like enzymes in 60 (40.3%). blaNDM-1 (45; 30.2%) and blaOXA-48 (29; 19.5%) were the most frequent. KPC-encoding genes were identified in 5 (3.6%) isolates. Most common sequence types were E. coli ST410 (8; 21.1%) and ST38 (7; 18.4%), and K. pneumoniae ST147 (13; 16%) and ST231 (7; 8.6%).


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xinyu Li ◽  
Wei Zhang ◽  
Jianming Zhang ◽  
Guang Li

Abstract Background Given expression data, gene regulatory network(GRN) inference approaches try to determine regulatory relations. However, current inference methods ignore the inherent topological characters of GRN to some extent, leading to structures that lack clear biological explanation. To increase the biophysical meanings of inferred networks, this study performed data-driven module detection before network inference. Gene modules were identified by decomposition-based methods. Results ICA-decomposition based module detection methods have been used to detect functional modules directly from transcriptomic data. Experiments about time-series expression, curated and scRNA-seq datasets suggested that the advantages of the proposed ModularBoost method over established methods, especially in the efficiency and accuracy. For scRNA-seq datasets, the ModularBoost method outperformed other candidate inference algorithms. Conclusions As a complicated task, GRN inference can be decomposed into several tasks of reduced complexity. Using identified gene modules as topological constraints, the initial inference problem can be accomplished by inferring intra-modular and inter-modular interactions respectively. Experimental outcomes suggest that the proposed ModularBoost method can improve the accuracy and efficiency of inference algorithms by introducing topological constraints.


Sign in / Sign up

Export Citation Format

Share Document