Machine learning applied to predicting microorganism growth temperatures and enzyme catalytic optima

Mapping Intimacies ◽

10.1101/522342 ◽

2019 ◽

Cited By ~ 1

Author(s):

Gang Li ◽

Kersten S. Rabe ◽

Jens Nielsen ◽

Martin K. M. Engqvist

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Amino Acid Composition ◽

Acid Composition ◽

Optimal Growth ◽

Learning Model ◽

Rapid Identification ◽

Machine Learning Model ◽

Subsequent Step ◽

The Stability

AbstractEnzymes that catalyze chemical reactions at high temperatures are used for industrial biocatalysis, applications in molecular biology, and as highly evolvable starting points for protein engineering. The optimal growth temperature (OGT) of organisms is commonly used to estimate the stability of enzymes encoded in their genomes, but the number of experimentally determined OGT values are limited, particularly for ther-mophilic organisms. Here, we report on the development of a machine learning model that can accurately predict OGT for bacteria, archaea and microbial eukaryotes directly from their proteome-wide 2-mer amino acid composition. The trained model is made freely available for re-use. In a subsequent step we OGT data in combination with amino acid composition of individual enzymes to develop a second machine learning model – for prediction of enzyme catalytic temperature optima (Topt). The resulting model generates enzymeToptestimates that are far superior to using OGT alone. Finally, we predictToptfor 6.5 million enzymes, covering 4,447 enzyme classes, and make the resulting dataset available for researchers. This work enables simple and rapid identification of enzymes that are potentially functional at extreme temperatures.

Download Full-text

Developing a machine learning model to identify protein–protein interaction hotspots to facilitate drug discovery

PeerJ ◽

10.7717/peerj.10381 ◽

2020 ◽

Vol 8 ◽

pp. e10381

Author(s):

Rohit Nandakumar ◽

Valentin Dinu

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Drug Discovery ◽

Structural Information ◽

Learning Model ◽

Protein Protein Interaction ◽

Drug Molecules ◽

Machine Learning Model ◽

Disease Associations ◽

History Of

Throughout the history of drug discovery, an enzymatic-based approach for identifying new drug molecules has been primarily utilized. Recently, protein–protein interfaces that can be disrupted to identify small molecules that could be viable targets for certain diseases, such as cancer and the human immunodeficiency virus, have been identified. Existing studies computationally identify hotspots on these interfaces, with most models attaining accuracies of ~70%. Many studies do not effectively integrate information relating to amino acid chains and other structural information relating to the complex. Herein, (1) a machine learning model has been created and (2) its ability to integrate multiple features, such as those associated with amino-acid chains, has been evaluated to enhance the ability to predict protein–protein interface hotspots. Virtual drug screening analysis of a set of hotspots determined on the EphB2-ephrinB2 complex has also been performed. The predictive capabilities of this model offer an AUROC of 0.842, sensitivity/recall of 0.833, and specificity of 0.850. Virtual screening of a set of hotspots identified by the machine learning model developed in this study has identified potential medications to treat diseases caused by the overexpression of the EphB2-ephrinB2 complex, including prostate, gastric, colorectal and melanoma cancers which are linked to EphB2 mutations. The efficacy of this model has been demonstrated through its successful ability to predict drug-disease associations previously identified in literature, including cimetidine, idarubicin, pralatrexate for these conditions. In addition, nadolol, a beta blocker, has also been identified in this study to bind to the EphB2-ephrinB2 complex, and the possibility of this drug treating multiple cancers is still relatively unexplored.

Download Full-text

Extendable Machine Learning Model for the Stability of Single Atom Alloys

Topics in Catalysis ◽

10.1007/s11244-020-01267-2 ◽

2020 ◽

Vol 63 (7-8) ◽

pp. 728-741 ◽

Cited By ~ 2

Author(s):

Karun K. Rao ◽

Quan K. Do ◽

Khoa Pham ◽

Debtanu Maiti ◽

Lars C. Grabow

Keyword(s):

Machine Learning ◽

Learning Model ◽

Single Atom ◽

Machine Learning Model ◽

The Stability ◽

Single Atom Alloys

Download Full-text

Prediction of metalloproteinase family based on the concept of Chou’s pseudo amino acid composition using a machine learning approach

Journal of Structural and Functional Genomics ◽

10.1007/s10969-011-9120-4 ◽

2011 ◽

Vol 12 (4) ◽

pp. 191-197 ◽

Cited By ~ 75

Author(s):

Majid Mohammad Beigi ◽

Mohaddeseh Behjati ◽

Hassan Mohabatkar

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Amino Acid Composition ◽

Acid Composition ◽

Learning Approach ◽

Pseudo Amino Acid Composition ◽

Machine Learning Approach ◽

Family Based

Download Full-text

Pathway-Specific Protein Domains (PSPD) Discrimination by Using a Hybrid Feature Space Based on Deep Neural Network(DNN)

10.21203/rs.3.rs-70425/v1 ◽

2020 ◽

Author(s):

Ali Ghulam ◽

XiuJuan Lei ◽

Yuchen Zhang ◽

Zhenqiang Wu

Keyword(s):

Neural Network ◽

Machine Learning ◽

Amino Acid ◽

Amino Acid Composition ◽

Correlation Coefficient ◽

Acid Composition ◽

Deep Neural Network ◽

Protein Domains ◽

Specific Protein ◽

Matthew’S Correlation Coefficient

Abstract The Pathway-specific protein domains (PSPDs) are important tools in examining drug growth as they provide a fast, reliable, and inexpensive way of estimating complex new molecular targets in specific diseases. The protein architecture prevents the formation of a direct correlation between signal transduction behavior and cellular structure. Accordingly, protein–tissue factor pathway inhibitor 2 isotypes 1 precursors have been used to encode peptide sequence information into specific feature structures. The measurable structure-activity classification model obtained by machine learning technology can predict pathway-specific protein interactions and new signaling peptides. We introduce deep neural network (DNN)-based PSPDs, abbreviated as DNNPSPDs, as the first pathway-specific protein domain that is built based on five extant models, namely, the AAindex, pseudo-amino acid composition, amino acid composition, composition mood of pseudoamino acids, and dipeptide composition. A total of 900 proteins with undetermined roles collected from the PDB data base are tested to evaluate the predictive power of this model. Various combinations of the available feature selection technologies are also combined to process a hybrid function space. DNNPSPDs predicts PSPDs by using features that are automatically learned from primary protein sequences. The sequences of pathway-associated proteins are sequentially fed into and decoded in neural network layers. Several classifications are also employed. DNNPSPDs achieves a prediction accuracy of 0.957 at a Matthew’s correlation coefficient (MCC) of 91.86%, with DPC, and 2nd achieve high prediction score 0.936 at Matthew’s correlation coefficient (MCC) of 88.02%, accuracy which is probably better. In terms of ROC–AUC, DNNPSPDs achieves a ROC–AUC curve of 0.982, which is larger than that of the other machine learning classifiers. A study using an alternative dataset reveals that our primary pathways, as pathway-specific protein domains, have accurate and reliable associations, thereby proving the viability of the proposed DNNPSPDs.

Download Full-text

Predicting Antibacterial Peptides by the Concept of Chou's Pseudo-amino Acid Composition and Machine Learning Methods

Protein and Peptide Letters ◽

10.2174/092986613804725307 ◽

2013 ◽

Vol 20 (2) ◽

pp. 180-186 ◽

Cited By ~ 76

Author(s):

Maede Khosravian ◽

Fateme Kazemi Faramarzi ◽

Majid Mohammad Beigi ◽

Mandana Behbahani ◽

Hassan Mohabatkar

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Amino Acid Composition ◽

Acid Composition ◽

Antibacterial Peptides ◽

Pseudo Amino Acid Composition ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Pathogenicity Prediction of Single Amino Acid Variants with Machine Learning Model Based on Protein Structural Energies

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2021.3139048 ◽

2021 ◽

pp. 1-1

Author(s):

Tzu-Hsuan Wu ◽

Peng-Chan Lin ◽

Hsin-Hung Chou ◽

Meng-Ru Shen ◽

Sun-Yuan Hsieh

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Learning Model ◽

Single Amino Acid ◽

Model Based ◽

Pathogenicity Prediction ◽

Machine Learning Model ◽

Amino Acid Variants

Download Full-text

Using Chou’s Pseudo Amino Acid Composition and Machine Learning Method to Predict the Antiviral Peptides

The Open Bioinformatics Journal ◽

10.2174/1875036201509010013 ◽

2015 ◽

Vol 9 (1) ◽

pp. 13-19 ◽

Cited By ~ 7

Author(s):

Maryam Zare ◽

Hassan Mohabatkar ◽

Fateme Kazemi Faramarzi ◽

Majid Mohammad Beigi ◽

Mandana Behbahani

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Amino Acid Composition ◽

Acid Composition ◽

Pseudo Amino Acid Composition ◽

Antiviral Therapies ◽

Machine Learning Methods ◽

Virus Life Cycle ◽

Antiviral Peptides ◽

Hiv 1

Traditional antiviral therapies are expensive, limitedly available, and cause several side effects. Currently, designing antiviral peptides is very important, because these peptides interfere with the key stage of virus life cycle. Most of the antiviral peptides are derived from viral proteins for example peptide derived from HIV-1 capsid protein. Because of the importance of these peptides, in this study the concept of pseudo-amino acid composition (PseAAC) and machine learning methods are used to classify or identify antiviral peptides.

Download Full-text

Using Chou’s Five-steps Rule to Classify and Predict Glutathione S-transferases with Different Machine Learning Algorithms and Pseudo Amino Acid Composition

International Journal of Peptide Research and Therapeutics ◽

10.1007/s10989-020-10087-7 ◽

2020 ◽

Author(s):

Hassan Mohabatkar ◽

Samira Ebrahimi ◽

Mohammad Moradi

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Amino Acid Composition ◽

Acid Composition ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Pseudo Amino Acid Composition ◽

Glutathione S Transferases

Download Full-text

Plant-mSubP: a computational framework for the prediction of single- and multi-target protein subcellular localization using integrated machine-learning approaches

AoB Plants ◽

10.1093/aobpla/plz068 ◽

2019 ◽

Vol 12 (3) ◽

Author(s):

Sitanshu S Sahu ◽

Cristian D Loaiza ◽

Rakesh Kaundal

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Subcellular Localization ◽

Amino Acid Composition ◽

Acid Composition ◽

Learning Approaches ◽

Pseudo Amino Acid Composition ◽

Hybrid Features ◽

Dual Label ◽

Localization Prediction

Abstract The subcellular localization of proteins is very important for characterizing its function in a cell. Accurate prediction of the subcellular locations in computational paradigm has been an active area of interest. Most of the work has been focused on single localization prediction. Only few studies have discussed the multi-target localization, but have not achieved good accuracy so far; in plant sciences, very limited work has been done. Here we report the development of a novel tool Plant-mSubP, which is based on integrated machine learning approaches to efficiently predict the subcellular localizations in plant proteomes. The proposed approach predicts with high accuracy 11 single localizations and three dual locations of plant cell. Several hybrid features based on composition and physicochemical properties of a protein such as amino acid composition, pseudo amino acid composition, auto-correlation descriptors, quasi-sequence-order descriptors and hybrid features are used to represent the protein. The performance of the proposed method has been assessed through a training set as well as an independent test set. Using the hybrid feature of the pseudo amino acid composition, N-Center-C terminal amino acid composition and the dipeptide composition (PseAAC-NCC-DIPEP), an overall accuracy of 81.97 %, 84.75 % and 87.88 % is achieved on the training data set of proteins containing the single-label, single- and dual-label combined, and dual-label proteins, respectively. When tested on the independent data, an accuracy of 64.36 %, 64.84 % and 81.08 % is achieved on the single-label, single- and dual-label, and dual-label proteins, respectively. The prediction models have been implemented on a web server available at http://bioinfo.usu.edu/Plant-mSubP/. The results indicate that the proposed approach is comparable to the existing methods in single localization prediction and outperforms all other existing tools when compared for dual-label proteins. The prediction tool will be a useful resource for better annotation of various plant proteomes.

Download Full-text

α1-Acute-phase globulins of rats. Microheterogeneity after isoelectric focusing

Biochemical Journal ◽

10.1042/bj1300095 ◽

1972 ◽

Vol 130 (1) ◽

pp. 95-101 ◽

Cited By ~ 9

Author(s):

A. H. Gordon ◽

P. J. Dykes

Keyword(s):

Sialic Acid ◽

Amino Acid ◽

Amino Acid Composition ◽

Acid Composition ◽

Isoelectric Focusing ◽

Isoelectric Points ◽

Individual Bands ◽

The Stability ◽

The Individual ◽

Gel Isoelectric Focusing

1. Improved resolution of mixtures of α1-globulins was obtained by the use of isoelectric focusing. 2. Because material recovered after isoelectric focusing in polyacrylamide gels behaved in a manner which suggested interaction with components derived from the gel, isoelectric focusing when used for preparative purposes was done in a matrix of Sephadex G-75. 3. By this means material from the individual bands formed by isoelectric focusing in 6m-urea could be isolated. The stability of these substances was examined by further isoelectric focusing. 4. Analysis of material that had been shown to be homogenous by isoelectric focusing in the absence of urea and of that from several individual bands derived from the same sample by isoelectric focusing in 6m-urea showed different proportions of sialic acid but no change in amino acid composition. 5. In the presence of 6m-urea the isoelectric points found were increased by 0.14–0.25 pH unit. After removal of most of the sialic acid with neuraminidase the increase was 0.36–0.72 pH unit. After treatment with 0.025m-H2SO4 at 80°C for 1h, which removed all the sialic acid, the increase was 0.40–0.87 pH unit. 6. Because removal of all the sialic acid did not decrease the number of bands formed by isoelectric focusing the observed heterogeneity could not be caused entirely by the presence of various proportions of sialic acid.

Download Full-text