Ens-PPI: A Novel Ensemble Classifier for Predicting the Interactions of Proteins Using Autocovariance Transformation from PSSM

BioMed Research International ◽

10.1155/2016/4563524 ◽

2016 ◽

Vol 2016 ◽

pp. 1-8 ◽

Cited By ~ 13

Author(s):

Zhen-Guo Gao ◽

Lei Wang ◽

Shi-Xiong Xia ◽

Zhu-Hong You ◽

Xin Yan ◽

...

Keyword(s):

Machine Learning ◽

Protein Interactions ◽

Biological Activities ◽

Ensemble Classifier ◽

Prediction Performance ◽

Protein Protein Interactions ◽

Rotation Forest ◽

Proposed Model ◽

Protein Amino Acids ◽

Scoring Matrix

Protein-Protein Interactions (PPIs) play vital roles in most biological activities. Although the development of high-throughput biological technologies has generated considerable PPI data for various organisms, many problems are still far from being solved. A number of computational methods based on machine learning have been developed to facilitate the identification of novel PPIs. In this study, a novel predictor was designed using the Rotation Forest (RF) algorithm combined with Autocovariance (AC) features extracted from the Position-Specific Scoring Matrix (PSSM). More specifically, the PSSMs are generated using the information of protein amino acids sequence. Then, an effective sequence-based features representation, Autocovariance, is employed to extract features from PSSMs. Finally, the RF model is used as a classifier to distinguish between the interacting and noninteracting protein pairs. The proposed method achieves promising prediction performance when performed on the PPIs ofYeast,H.pylori, andindependent datasets. The good results show that the proposed model is suitable for PPIs prediction and could also provide a useful supplementary tool for solving other bioinformatics problems.

Download Full-text

Systematic auditing is essential to debiasing machine learning in biology

10.1101/2020.05.08.085183 ◽

2020 ◽

Cited By ~ 1

Author(s):

Fatma-Elzahraa Eid ◽

Haitham Elmarakeby ◽

Yujia Alina Chan ◽

Nadine Fornelos Martins ◽

Mahmoud ElHefnawi ◽

...

Keyword(s):

Machine Learning ◽

Protein Interactions ◽

Drug Target ◽

Peptide Binding ◽

Life Sciences ◽

Prediction Performance ◽

Biological Data ◽

Training Data ◽

Protein Protein Interactions ◽

Interest Prediction

AbstractRepresentational biases that are common in biological data can inflate prediction performance and confound our understanding of how and what machine learning (ML) models learn from large complicated datasets. However, auditing for these biases is not a common practice in ML in the life sciences. Here, we devise a systematic auditing framework and harness it to audit three different ML applications of significant therapeutic interest: prediction frameworks of protein-protein interactions, drug-target bioactivity, and MHC-peptide binding. Through this, we identify unrecognized biases that hinder the ML process and result in low model generalizability. Ultimately, we show that, when there is insufficient signal in the training data, ML models are likely to learn primarily from representational biases.

Download Full-text

Advancing the prediction accuracy of protein-protein interactions by utilizing evolutionary information from position-specific scoring matrix and ensemble classifier

Journal of Theoretical Biology ◽

10.1016/j.jtbi.2017.01.003 ◽

2017 ◽

Vol 418 ◽

pp. 105-110 ◽

Cited By ~ 32

Author(s):

Lei Wang ◽

Zhu-Hong You ◽

Shi-Xiong Xia ◽

Feng Liu ◽

Xing Chen ◽

...

Keyword(s):

Protein Interactions ◽

Prediction Accuracy ◽

Ensemble Classifier ◽

Position Specific Scoring Matrix ◽

Evolutionary Information ◽

Protein Protein Interactions ◽

Scoring Matrix

Download Full-text

Sequence-Based Prediction of Plant Protein-Protein Interactions by Combining Discrete Sine Transformation With Rotation Forest

Evolutionary Bioinformatics ◽

10.1177/11769343211050067 ◽

2021 ◽

Vol 17 ◽

pp. 117693432110500

Author(s):

Jie Pan ◽

Li-Ping Li ◽

Chang-Qing Yu ◽

Zhu-Hong You ◽

Yong-Jian Guan ◽

...

Keyword(s):

Protein Interactions ◽

Extraction Methods ◽

Plant Protein ◽

Evolutionary Information ◽

Protein Protein Interactions ◽

Prediction Ability ◽

Rotation Forest ◽

Learning Classifier ◽

High Prediction ◽

Scoring Matrix

Protein-protein interactions (PPIs) in plants are essential for understanding the regulation of biological processes. Although high-throughput technologies have been widely used to identify PPIs, they are usually laborious, expensive, and suffer from high false-positive rates. Therefore, it is imperative to develop novel computational approaches as a supplement tool to detect PPIs in plants. In this work, we presented a method, namely DST-RoF, to identify PPIs in plants by combining an ensemble learning classifier-Rotation Forest (RoF) with discrete sine transformation (DST). Specifically, plant protein sequence is firstly converted into Position-Specific Scoring Matrix (PSSM). Then, the discrete sine transformation was employed to extract effective features for obtaining the evolutionary information of proteins. Finally, these optimal features were fed into the RoF classifier for training and prediction. When performed on the plant datasets Arabidopsis, Rice, and Maize, DST-RoF yielded high prediction accuracy of 82.95%, 88.82%, and 93.70%, respectively. To further evaluate the prediction ability of our approach, we compared it with 4 state-of-the-art classifiers and 3 different feature extraction methods. Comprehensive experimental results anticipated that our method is feasible and robust for predicting potential plant-protein interacted pairs.

Download Full-text

Targeting Virus-host Protein Interactions: Feature Extraction and Machine Learning Approaches

Current Drug Metabolism ◽

10.2174/1389200219666180829121038 ◽

2019 ◽

Vol 20 (3) ◽

pp. 177-184 ◽

Cited By ~ 16

Author(s):

Nantao Zheng ◽

Kairou Wang ◽

Weihua Zhan ◽

Lei Deng

Keyword(s):

Machine Learning ◽

Computational Methods ◽

Protein Interactions ◽

Prediction Models ◽

Learning Algorithms ◽

Biological Data ◽

Machine Learning Algorithms ◽

Host Protein ◽

Protein Protein Interactions ◽

Protein Motifs

Background:Targeting critical viral-host Protein-Protein Interactions (PPIs) has enormous application prospects for therapeutics. Using experimental methods to evaluate all possible virus-host PPIs is labor-intensive and time-consuming. Recent growth in computational identification of virus-host PPIs provides new opportunities for gaining biological insights, including applications in disease control. We provide an overview of recent computational approaches for studying virus-host PPI interactions.Methods:In this review, a variety of computational methods for virus-host PPIs prediction have been surveyed. These methods are categorized based on the features they utilize and different machine learning algorithms including classical and novel methods.Results:We describe the pivotal and representative features extracted from relevant sources of biological data, mainly include sequence signatures, known domain interactions, protein motifs and protein structure information. We focus on state-of-the-art machine learning algorithms that are used to build binary prediction models for the classification of virus-host protein pairs and discuss their abilities, weakness and future directions.Conclusion:The findings of this review confirm the importance of computational methods for finding the potential protein-protein interactions between virus and host. Although there has been significant progress in the prediction of virus-host PPIs in recent years, there is a lot of room for improvement in virus-host PPI prediction.

Download Full-text

Distinct p53 acetylation cassettes differentially influence gene-expression patterns and cell fate

The Journal of Cell Biology ◽

10.1083/jcb.200512059 ◽

2006 ◽

Vol 173 (4) ◽

pp. 533-544 ◽

Cited By ~ 171

Author(s):

Chad D. Knights ◽

Jason Catania ◽

Simone Di Giovanni ◽

Selen Muratoglu ◽

Ricardo Perez ◽

...

Keyword(s):

Gene Expression ◽

Cell Fate ◽

Protein Interactions ◽

Posttranslational Modifications ◽

Biological Activities ◽

Expression Profiles ◽

Expression Patterns ◽

Gene Expression Profiles ◽

P53 Gene ◽

Protein Protein Interactions

The activity of the p53 gene product is regulated by a plethora of posttranslational modifications. An open question is whether such posttranslational changes act redundantly or dependently upon one another. We show that a functional interference between specific acetylated and phosphorylated residues of p53 influences cell fate. Acetylation of lysine 320 (K320) prevents phosphorylation of crucial serines in the NH2-terminal region of p53; only allows activation of genes containing high-affinity p53 binding sites, such as p21/WAF; and promotes cell survival after DNA damage. In contrast, acetylation of K373 leads to hyperphosphorylation of p53 NH2-terminal residues and enhances the interaction with promoters for which p53 possesses low DNA binding affinity, such as those contained in proapoptotic genes, leading to cell death. Further, acetylation of each of these two lysine clusters differentially regulates the interaction of p53 with coactivators and corepressors and produces distinct gene-expression profiles. By analogy with the “histone code” hypothesis, we propose that the multiple biological activities of p53 are orchestrated and deciphered by different “p53 cassettes,” each containing combination patterns of posttranslational modifications and protein–protein interactions.

Download Full-text

A Novel Phosphorylation Site-Kinase Network-Based Method for the Accurate Prediction of Kinase-Substrate Relationships

BioMed Research International ◽

10.1155/2017/1826496 ◽

2017 ◽

Vol 2017 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Minghui Wang ◽

Tao Wang ◽

Binghua Wang ◽

Yu Liu ◽

Ao Li

Keyword(s):

Protein Interactions ◽

Phosphorylation Site ◽

Prediction Performance ◽

Accurate Prediction ◽

Phosphorylation Sites ◽

Protein Protein Interactions ◽

Biological Mechanisms ◽

Kinase Substrate ◽

Prediction Tools ◽

Local Sequence

Protein phosphorylation is catalyzed by kinases which regulate many aspects that control death, movement, and cell growth. Identification of the phosphorylation site-specific kinase-substrate relationships (ssKSRs) is important for understanding cellular dynamics and provides a fundamental basis for further disease-related research and drug design. Although several computational methods have been developed, most of these methods mainly use local sequence of phosphorylation sites and protein-protein interactions (PPIs) to construct the prediction model. While phosphorylation presents very complicated processes and is usually involved in various biological mechanisms, the aforementioned information is not sufficient for accurate prediction. In this study, we propose a new and powerful computational approach named KSRPred for ssKSRs prediction, by introducing a novel phosphorylation site-kinase network (pSKN) profiles that can efficiently incorporate the relationships between various protein kinases and phosphorylation sites. The experimental results show that the pSKN profiles can efficiently improve the prediction performance in collaboration with local sequence and PPI information. Furthermore, we compare our method with the existing ssKSRs prediction tools and the results demonstrate that KSRPred can significantly improve the prediction performance compared with existing tools.

Download Full-text

Evaluation of Machine Learning Algorithms on Protein-Protein Interactions

Advances in Intelligent Systems and Computing - Man-Machine Interactions 3 ◽

10.1007/978-3-319-02309-0_22 ◽

2014 ◽

pp. 211-218 ◽

Cited By ~ 1

Author(s):

Indrajit Saha ◽

Tomas Klingström ◽

Simon Forsberg ◽

Johan Wikander ◽

Julian Zubek ◽

...

Keyword(s):

Machine Learning ◽

Protein Interactions ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Protein Protein Interactions

Download Full-text

Issues in performance evaluation for host–pathogen protein interaction prediction

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016500116 ◽

2016 ◽

Vol 14 (03) ◽

pp. 1650011 ◽

Cited By ~ 9

Author(s):

Wajid Arshad Abbasi ◽

Fayyaz Ul Amir Afsar Minhas

Keyword(s):

Machine Learning ◽

Protein Interactions ◽

Cross Validation ◽

Protein Protein Interactions ◽

Evaluation Scheme ◽

Host Pathogen ◽

Pathogen Protein ◽

Protein Interaction Prediction ◽

Underlying Mechanisms ◽

Fold Cross Validation

The study of interactions between host and pathogen proteins is important for understanding the underlying mechanisms of infectious diseases and for developing novel therapeutic solutions. Wet-lab techniques for detecting protein–protein interactions (PPIs) can benefit from computational predictions. Machine learning is one of the computational approaches that can assist biologists by predicting promising PPIs. A number of machine learning based methods for predicting host–pathogen interactions (HPI) have been proposed in the literature. The techniques used for assessing the accuracy of such predictors are of critical importance in this domain. In this paper, we question the effectiveness of K-fold cross-validation for estimating the generalization ability of HPI prediction for proteins with no known interactions. K-fold cross-validation does not model this scenario, and we demonstrate a sizable difference between its performance and the performance of an alternative evaluation scheme called leave one pathogen protein out (LOPO) cross-validation. LOPO is more effective in modeling the real world use of HPI predictors, specifically for cases in which no information about the interacting partners of a pathogen protein is available during training. We also point out that currently used metrics such as areas under the precision-recall or receiver operating characteristic curves are not intuitive to biologists and propose simpler and more directly interpretable metrics for this purpose.

Download Full-text

APEX2S: A two‐layer machine learning model for discovery of host‐pathogen protein‐protein interactions on cloud‐based multiomics data

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.5846 ◽

2020 ◽

Vol 32 (23) ◽

Author(s):

Huaming Chen ◽

Jun Shen ◽

Lei Wang ◽

Chi‐Hung Chi

Keyword(s):

Machine Learning ◽

Protein Interactions ◽

Learning Model ◽

Protein Protein Interactions ◽

Machine Learning Model ◽

Host Pathogen ◽

Pathogen Protein

Download Full-text

Botnet Forensic Analysis Using Machine Learning

Security and Communication Networks ◽

10.1155/2020/9302318 ◽

2020 ◽

Vol 2020 ◽

pp. 1-9

Author(s):

Anchit Bijalwan

Keyword(s):

Machine Learning ◽

Ensemble Classifier ◽

Forensic Analysis ◽

Modus Operandi ◽

Botnet Detection ◽

Quality Of Results ◽

Proposed Model ◽

Improve Accuracy ◽

Rapid Pace

Botnet forensic analysis helps in understanding the nature of attacks and the modus operandi used by the attackers. Botnet attacks are difficult to trace because of their rapid pace, epidemic nature, and smaller size. Machine learning works as a panacea for botnet attack related issues. It not only facilitates detection but also helps in prevention from bot attack. The proposed inquisition model endeavors improved quality of results by comprehensive botnet detection and forensic analysis. This scenario has been applied in eight different combinations of ensemble classifier technique to detect botnet evidence. The study is also compared to the ensemble-based classifiers with the single classifier using different parameters. The results exhibit that the proposed model can improve accuracy over a single classifier.

Download Full-text