Prediction of protein-protein interactions from amino acid sequences using extreme learning machine combined with auto covariance descriptor

Author(s):  
Zhu-Hong You ◽  
Liping Li ◽  
Zhen Ji ◽  
Min Li ◽  
Sen Guo
2021 ◽  
Author(s):  
JinXuan Zhai ◽  
Ji-Yong An

Abstract Background:Protein–protein interactions (PPIs) are involved in a number of cellular processes and play a key role inside cells. The prediction of PPIs is an important task towards the understanding of many bioinformatics functions and applications, such as predicting protein functions, gene-disease associations and disease-drug associations. Given that high-throughput methods are expensive and time-consuming, it is a challenging task to develop efficient and accurate computational methods for predicting PPIs .Results:In the study, a novel computational approach named WELM-SURF was developed to predict PPIs. The proposed method used Position Specific Scoring Matrix (PSSM) to capture protein evolutionary information and employed Speed Up Robot Features (SURF) to extract key features from PSSM of protein sequence. Weighted Extreme Learning Machine (WELM) is featured with short training time and great ability to execute classification efficiently by optimizing the loss function of weight matrix. Therefore, WELM classifier was used to carry out classification. The cross-validation results show that WELM-SURF obtains 97.36% and 95.12% of average accuracy on yeast and human dataset, respectively. The prediction ability of WELM-SURF was also compared with those of ELM-SRUF, SVM-SURF and other existing approaches. The comparison results further verify that WELM-SURF is obviously better than other methods.Conclusion:The experimental results proved that the WELM-SURF method is very useful for predicting PPIs and can also be applied to other bioinformatics studies of protein.


Author(s):  
Yuan-Miao Gui ◽  
Ru-Jing Wang ◽  
Xue Wang ◽  
Yuan-Yuan Wei

Protein–protein interactions (PPIs) help to elucidate the molecular mechanisms of life activities and have a certain role in promoting disease treatment and new drug development. With the advent of the proteomics era, some PPIs prediction methods have emerged. However, the performances of these PPIs prediction methods still need to be optimized and improved. In order to optimize the performance of the PPIs prediction methods, we used the dropout method to reduce over-fitting by deep neural networks (DNNs), and combined with three types of feature extraction methods, conjoint triad (CT), auto covariance (AC) and local descriptor (LD), to build DNN models based on amino acid sequences. The results showed that the accuracy of the CT, AC and LD increased from 97.11% to 98.12%, 96.84% to 98.17%, and 95.30% to 95.60%, respectively. The loss values of the CT, AC and LD decreased from 27.47% to 14.96%, 65.91% to 17.82% and 36.23% to 15.34%, respectively. Experimental results show that dropout can optimize the performances of the DNN models. The results can provide a resource for scholars in future studies involving the prediction of PPIs. The experimental code is available at https://github.com/smalltalkman/hppi-tensorflow .


Author(s):  
Hitoshi Koyano ◽  
Morihiro Hayashida ◽  
Tatsuya Akutsu

Numbers and numerical vectors account for a large portion of data. However, recently, the amount of string data generated has increased dramatically. Consequently, classifying string data is a common problem in many fields. The most widely used approach to this problem is to convert strings into numerical vectors using string kernels and subsequently apply a support vector machine that works in a numerical vector space. However, this non-one-to-one conversion involves a loss of information and makes it impossible to evaluate, using probability theory, the generalization error of a learning machine, considering that the given data to train and test the machine are strings generated according to probability laws. In this study, we approach this classification problem by constructing a classifier that works in a set of strings. To evaluate the generalization error of such a classifier theoretically, probability theory for strings is required. Therefore, we first extend a limit theorem for a consensus sequence of strings demonstrated by one of the authors and co-workers in a previous study. Using the obtained result, we then demonstrate that our learning machine classifies strings in an asymptotically optimal manner. Furthermore, we demonstrate the usefulness of our machine in practical data analysis by applying it to predicting protein–protein interactions using amino acid sequences and classifying RNAs by the secondary structure using nucleotide sequences.


Sign in / Sign up

Export Citation Format

Share Document