scholarly journals Machine learning applied to predicting microorganism growth temperatures and enzyme catalytic optima

2019 ◽  
Author(s):  
Gang Li ◽  
Kersten S. Rabe ◽  
Jens Nielsen ◽  
Martin K. M. Engqvist

AbstractEnzymes that catalyze chemical reactions at high temperatures are used for industrial biocatalysis, applications in molecular biology, and as highly evolvable starting points for protein engineering. The optimal growth temperature (OGT) of organisms is commonly used to estimate the stability of enzymes encoded in their genomes, but the number of experimentally determined OGT values are limited, particularly for ther-mophilic organisms. Here, we report on the development of a machine learning model that can accurately predict OGT for bacteria, archaea and microbial eukaryotes directly from their proteome-wide 2-mer amino acid composition. The trained model is made freely available for re-use. In a subsequent step we OGT data in combination with amino acid composition of individual enzymes to develop a second machine learning model – for prediction of enzyme catalytic temperature optima (Topt). The resulting model generates enzymeToptestimates that are far superior to using OGT alone. Finally, we predictToptfor 6.5 million enzymes, covering 4,447 enzyme classes, and make the resulting dataset available for researchers. This work enables simple and rapid identification of enzymes that are potentially functional at extreme temperatures.

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e10381
Author(s):  
Rohit Nandakumar ◽  
Valentin Dinu

Throughout the history of drug discovery, an enzymatic-based approach for identifying new drug molecules has been primarily utilized. Recently, protein–protein interfaces that can be disrupted to identify small molecules that could be viable targets for certain diseases, such as cancer and the human immunodeficiency virus, have been identified. Existing studies computationally identify hotspots on these interfaces, with most models attaining accuracies of ~70%. Many studies do not effectively integrate information relating to amino acid chains and other structural information relating to the complex. Herein, (1) a machine learning model has been created and (2) its ability to integrate multiple features, such as those associated with amino-acid chains, has been evaluated to enhance the ability to predict protein–protein interface hotspots. Virtual drug screening analysis of a set of hotspots determined on the EphB2-ephrinB2 complex has also been performed. The predictive capabilities of this model offer an AUROC of 0.842, sensitivity/recall of 0.833, and specificity of 0.850. Virtual screening of a set of hotspots identified by the machine learning model developed in this study has identified potential medications to treat diseases caused by the overexpression of the EphB2-ephrinB2 complex, including prostate, gastric, colorectal and melanoma cancers which are linked to EphB2 mutations. The efficacy of this model has been demonstrated through its successful ability to predict drug-disease associations previously identified in literature, including cimetidine, idarubicin, pralatrexate for these conditions. In addition, nadolol, a beta blocker, has also been identified in this study to bind to the EphB2-ephrinB2 complex, and the possibility of this drug treating multiple cancers is still relatively unexplored.


2020 ◽  
Vol 63 (7-8) ◽  
pp. 728-741 ◽  
Author(s):  
Karun K. Rao ◽  
Quan K. Do ◽  
Khoa Pham ◽  
Debtanu Maiti ◽  
Lars C. Grabow

2020 ◽  
Author(s):  
Ali Ghulam ◽  
XiuJuan Lei ◽  
Yuchen Zhang ◽  
Zhenqiang Wu

Abstract The Pathway-specific protein domains (PSPDs) are important tools in examining drug growth as they provide a fast, reliable, and inexpensive way of estimating complex new molecular targets in specific diseases. The protein architecture prevents the formation of a direct correlation between signal transduction behavior and cellular structure. Accordingly, protein–tissue factor pathway inhibitor 2 isotypes 1 precursors have been used to encode peptide sequence information into specific feature structures. The measurable structure-activity classification model obtained by machine learning technology can predict pathway-specific protein interactions and new signaling peptides. We introduce deep neural network (DNN)-based PSPDs, abbreviated as DNNPSPDs, as the first pathway-specific protein domain that is built based on five extant models, namely, the AAindex, pseudo-amino acid composition, amino acid composition, composition mood of pseudoamino acids, and dipeptide composition. A total of 900 proteins with undetermined roles collected from the PDB data base are tested to evaluate the predictive power of this model. Various combinations of the available feature selection technologies are also combined to process a hybrid function space. DNNPSPDs predicts PSPDs by using features that are automatically learned from primary protein sequences. The sequences of pathway-associated proteins are sequentially fed into and decoded in neural network layers. Several classifications are also employed. DNNPSPDs achieves a prediction accuracy of 0.957 at a Matthew’s correlation coefficient (MCC) of 91.86%, with DPC, and 2nd achieve high prediction score 0.936 at Matthew’s correlation coefficient (MCC) of 88.02%, accuracy which is probably better. In terms of ROC–AUC, DNNPSPDs achieves a ROC–AUC curve of 0.982, which is larger than that of the other machine learning classifiers. A study using an alternative dataset reveals that our primary pathways, as pathway-specific protein domains, have accurate and reliable associations, thereby proving the viability of the proposed DNNPSPDs.


2013 ◽  
Vol 20 (2) ◽  
pp. 180-186 ◽  
Author(s):  
Maede Khosravian ◽  
Fateme Kazemi Faramarzi ◽  
Majid Mohammad Beigi ◽  
Mandana Behbahani ◽  
Hassan Mohabatkar

2015 ◽  
Vol 9 (1) ◽  
pp. 13-19 ◽  
Author(s):  
Maryam Zare ◽  
Hassan Mohabatkar ◽  
Fateme Kazemi Faramarzi ◽  
Majid Mohammad Beigi ◽  
Mandana Behbahani

Traditional antiviral therapies are expensive, limitedly available, and cause several side effects. Currently, designing antiviral peptides is very important, because these peptides interfere with the key stage of virus life cycle. Most of the antiviral peptides are derived from viral proteins for example peptide derived from HIV-1 capsid protein. Because of the importance of these peptides, in this study the concept of pseudo-amino acid composition (PseAAC) and machine learning methods are used to classify or identify antiviral peptides.


AoB Plants ◽  
2019 ◽  
Vol 12 (3) ◽  
Author(s):  
Sitanshu S Sahu ◽  
Cristian D Loaiza ◽  
Rakesh Kaundal

Abstract The subcellular localization of proteins is very important for characterizing its function in a cell. Accurate prediction of the subcellular locations in computational paradigm has been an active area of interest. Most of the work has been focused on single localization prediction. Only few studies have discussed the multi-target localization, but have not achieved good accuracy so far; in plant sciences, very limited work has been done. Here we report the development of a novel tool Plant-mSubP, which is based on integrated machine learning approaches to efficiently predict the subcellular localizations in plant proteomes. The proposed approach predicts with high accuracy 11 single localizations and three dual locations of plant cell. Several hybrid features based on composition and physicochemical properties of a protein such as amino acid composition, pseudo amino acid composition, auto-correlation descriptors, quasi-sequence-order descriptors and hybrid features are used to represent the protein. The performance of the proposed method has been assessed through a training set as well as an independent test set. Using the hybrid feature of the pseudo amino acid composition, N-Center-C terminal amino acid composition and the dipeptide composition (PseAAC-NCC-DIPEP), an overall accuracy of 81.97 %, 84.75 % and 87.88 % is achieved on the training data set of proteins containing the single-label, single- and dual-label combined, and dual-label proteins, respectively. When tested on the independent data, an accuracy of 64.36 %, 64.84 % and 81.08 % is achieved on the single-label, single- and dual-label, and dual-label proteins, respectively. The prediction models have been implemented on a web server available at http://bioinfo.usu.edu/Plant-mSubP/. The results indicate that the proposed approach is comparable to the existing methods in single localization prediction and outperforms all other existing tools when compared for dual-label proteins. The prediction tool will be a useful resource for better annotation of various plant proteomes.


1972 ◽  
Vol 130 (1) ◽  
pp. 95-101 ◽  
Author(s):  
A. H. Gordon ◽  
P. J. Dykes

1. Improved resolution of mixtures of α1-globulins was obtained by the use of isoelectric focusing. 2. Because material recovered after isoelectric focusing in polyacrylamide gels behaved in a manner which suggested interaction with components derived from the gel, isoelectric focusing when used for preparative purposes was done in a matrix of Sephadex G-75. 3. By this means material from the individual bands formed by isoelectric focusing in 6m-urea could be isolated. The stability of these substances was examined by further isoelectric focusing. 4. Analysis of material that had been shown to be homogenous by isoelectric focusing in the absence of urea and of that from several individual bands derived from the same sample by isoelectric focusing in 6m-urea showed different proportions of sialic acid but no change in amino acid composition. 5. In the presence of 6m-urea the isoelectric points found were increased by 0.14–0.25 pH unit. After removal of most of the sialic acid with neuraminidase the increase was 0.36–0.72 pH unit. After treatment with 0.025m-H2SO4 at 80°C for 1h, which removed all the sialic acid, the increase was 0.40–0.87 pH unit. 6. Because removal of all the sialic acid did not decrease the number of bands formed by isoelectric focusing the observed heterogeneity could not be caused entirely by the presence of various proportions of sialic acid.


Sign in / Sign up

Export Citation Format

Share Document