Maximum margin classifier working in a set of strings

Hitoshi Koyano; Morihiro Hayashida; Tatsuya Akutsu

doi:10.1098/rspa.2015.0551

Maximum margin classifier working in a set of strings

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rspa.2015.0551 ◽

2016 ◽

Vol 472 (2187) ◽

pp. 20150551 ◽

Cited By ~ 1

Author(s):

Hitoshi Koyano ◽

Morihiro Hayashida ◽

Tatsuya Akutsu

Keyword(s):

Probability Theory ◽

Protein Interactions ◽

Consensus Sequence ◽

Classification Problem ◽

Amino Acid Sequences ◽

Support Vector ◽

Generalization Error ◽

Protein Protein Interactions ◽

String Kernels ◽

Learning Machine

Numbers and numerical vectors account for a large portion of data. However, recently, the amount of string data generated has increased dramatically. Consequently, classifying string data is a common problem in many fields. The most widely used approach to this problem is to convert strings into numerical vectors using string kernels and subsequently apply a support vector machine that works in a numerical vector space. However, this non-one-to-one conversion involves a loss of information and makes it impossible to evaluate, using probability theory, the generalization error of a learning machine, considering that the given data to train and test the machine are strings generated according to probability laws. In this study, we approach this classification problem by constructing a classifier that works in a set of strings. To evaluate the generalization error of such a classifier theoretically, probability theory for strings is required. Therefore, we first extend a limit theorem for a consensus sequence of strings demonstrated by one of the authors and co-workers in a previous study. Using the obtained result, we then demonstrate that our learning machine classifies strings in an asymptotically optimal manner. Furthermore, we demonstrate the usefulness of our machine in practical data analysis by applying it to predicting protein–protein interactions using amino acid sequences and classifying RNAs by the secondary structure using nucleotide sequences.

Download Full-text

An Efficient Feature Extraction Technique Based on Local Coding PSSM and Multifeatures Fusion for Predicting Protein-Protein Interactions

Evolutionary Bioinformatics ◽

10.1177/1176934319879920 ◽

2019 ◽

Vol 15 ◽

pp. 117693431987992 ◽

Cited By ~ 1

Author(s):

Ji-Yong An ◽

Yong Zhou ◽

Yu-Jun Zhao ◽

Zi-Ji Yan

Keyword(s):

Feature Extraction ◽

Protein Interactions ◽

Functional Organization ◽

Extraction Methods ◽

Amino Acid Sequences ◽

Evolutionary Information ◽

Support Vector ◽

Svm Classifier ◽

Protein Protein Interactions ◽

Local Coding

Background: Increasing evidence has indicated that protein-protein interactions (PPIs) play important roles in various aspects of the structural and functional organization of a cell. Thus, continuing to uncover potential PPIs is an important topic in the biomedical domain. Although various feature extraction methods with machine learning approaches have enhanced the prediction of PPIs. There remains room for improvement by developing novel and effective feature extraction methods and classifier approaches to identify PPIs. Method: In this study, we proposed a sequence-based feature extraction method called LCPSSMMF, which combined local coding position-specific scoring matrix (PSSM) with multifeatures fusion. First, we used a novel local coding method based on PSSM to build a new PSSM (CPSSM); the advantage of this method is that it incorporated global and local feature extraction, which can account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. Second, we adopted 2 different feature extraction methods (Local Average Group [LAG] and Bigram Probability [BP]) to capture multiple key feature information by employing the evolutionary information embedded in the CPSSM matrix. Finally, feature vectors were acquired by using multifeatures fusion method. Result: To evaluate the performance of the proposed feature extraction approach, we employed support vector machine (SVM) as a prediction classifier and applied this method to yeast and human PPI datasets. The prediction accuracies of LCPSSMMF were 93.43% and 90.41% on the yeast and human datasets, respectively. Moreover, we also compared the proposed method with the previous sequence-based approaches on the yeast datasets by using the same SVM classifier. The experimental results indicated that the performance of LCPSSMMF significantly exceeded that of several other state-of-the-art methods. It is proven that the LCPSSMMF approach can capture more local and global discriminatory information than almost all previous methods and can function remarkably well in identifying PPIs. To facilitate extensive research in future proteomics studies, we developed a LCPSSMMFSVM server, which is freely available for academic use at http://219.219.62.123:8888/LCPSSMMFSVM .

Download Full-text

Protein-Protein Interaction Prediction using PCA and SVR-PHCS

The Open Bioinformatics Journal ◽

10.2174/1875036201509010001 ◽

2015 ◽

Vol 9 (1) ◽

pp. 1-12

Author(s):

Saeideh Mahmoudian ◽

Abdulaziz Yousef ◽

Nasrollah Moghadam Charkari

Keyword(s):

Protein Interactions ◽

Principal Component ◽

Classification Problem ◽

Support Vector ◽

Protein Protein Interactions ◽

Classification Problems ◽

Machine Learning Classification ◽

Regression Methods ◽

Protein Protein Interaction ◽

Noise Data

Protein-Protein Interactions (PPIs) play a key role in many biological systems. Thus, identifying PPIs is critical for understanding cellular processes. Many experimental techniques were applied to predict PPIs. The data extracted using these techniques are incomplete and noisy. In this regard, a number of computational methods include machine learning classification techniques have been developed to reduce the noise data and predict new PPIs. Since, using regression methods to solve classification problems has good results in other applications. Therefore, in this paper, a regression view is applied to the PPI prediction classification problem, so a new approach is proposed using Principal Component Analysis (PCA) and Support Vector Regression (SVR) which has been improved by a new Parallel Hierarchical Cube Search (PHCS) method. Firstly, PCA algorithm is implemented to select an optimal subset of features which leads to reduce processing time and to lessen the effect of noise. Then, the PPIs would be predicted, by using SVR. To get a better performance of SVR, a new PHCS method has been applied to select the appropriate values of SVR parameters. The obtained classification accuracy of the proposed method is 74.505% on KUPS (The University of Kansas Proteomics Service) dataset which outperforms the other methods.

Download Full-text

Prediction of protein-protein interactions from amino acid sequences using extreme learning machine combined with auto covariance descriptor

2013 IEEE Workshop on Memetic Computing (MC) ◽

10.1109/mc.2013.6608211 ◽

2013 ◽

Cited By ~ 5

Author(s):

Zhu-Hong You ◽

Liping Li ◽

Zhen Ji ◽

Min Li ◽

Sen Guo

Keyword(s):

Amino Acid ◽

Extreme Learning Machine ◽

Protein Interactions ◽

Amino Acid Sequences ◽

Protein Protein Interactions ◽

Covariance Descriptor ◽

Learning Machine ◽

Auto Covariance

Download Full-text

Large-Scale Protein-Protein Interactions Detection by Integrating Big Biosensing Data with Computational Model

BioMed Research International ◽

10.1155/2014/598129 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 30

Author(s):

Zhu-Hong You ◽

Shuai Li ◽

Xin Gao ◽

Xin Luo ◽

Zhen Ji

Keyword(s):

Computational Model ◽

High Throughput ◽

Protein Interactions ◽

Large Scale ◽

False Negative ◽

Support Vector ◽

Data Detection ◽

Protein Protein Interactions ◽

Protein Interaction Dataset ◽

Learning Machine

Protein-protein interactions are the basis of biological functions, and studying these interactions on a molecular level is of crucial importance for understanding the functionality of a living cell. During the past decade, biosensors have emerged as an important tool for the high-throughput identification of proteins and their interactions. However, the high-throughput experimental methods for identifying PPIs are both time-consuming and expensive. On the other hand, high-throughput PPI data are often associated with high false-positive and high false-negative rates. Targeting at these problems, we propose a method for PPI detection by integrating biosensor-based PPI data with a novel computational model. This method was developed based on the algorithm of extreme learning machine combined with a novel representation of protein sequence descriptor. When performed on the large-scale human protein interaction dataset, the proposed method achieved 84.8% prediction accuracy with 84.08% sensitivity at the specificity of 85.53%. We conducted more extensive experiments to compare the proposed method with the state-of-the-art techniques, support vector machine. The achieved results demonstrate that our approach is very promising for detecting new PPIs, and it can be a helpful supplement for biosensor-based PPI data detection.

Download Full-text

Prediction of Disease Comorbidity Using HeteSim Scores based on Multiple Heterogeneous Networks

Current Gene Therapy ◽

10.2174/1566523219666190917155959 ◽

2019 ◽

Vol 19 (4) ◽

pp. 232-241 ◽

Cited By ~ 5

Author(s):

Xuegong Chen ◽

Wanwan Shi ◽

Lei Deng

Keyword(s):

Protein Interactions ◽

Experimental Studies ◽

Treatment Strategies ◽

Computational Method ◽

Biological Information ◽

Support Vector ◽

Protein Protein Interactions ◽

Efficient Treatment ◽

Disease Associations ◽

Previous State

Background: Accumulating experimental studies have indicated that disease comorbidity causes additional pain to patients and leads to the failure of standard treatments compared to patients who have a single disease. Therefore, accurate prediction of potential comorbidity is essential to design more efficient treatment strategies. However, only a few disease comorbidities have been discovered in the clinic. Objective: In this work, we propose PCHS, an effective computational method for predicting disease comorbidity. Materials and Methods: We utilized the HeteSim measure to calculate the relatedness score for different disease pairs in the global heterogeneous network, which integrates six networks based on biological information, including disease-disease associations, drug-drug interactions, protein-protein interactions and associations among them. We built the prediction model using the Support Vector Machine (SVM) based on the HeteSim scores. Results and Conclusion: The results showed that PCHS performed significantly better than previous state-of-the-art approaches and achieved an AUC score of 0.90 in 10-fold cross-validation. Furthermore, some of our predictions have been verified in literatures, indicating the effectiveness of our method.

Download Full-text

Prediction of Protein-Protein Interactions Based on Molecular Interface Features and the Support Vector Machine

Current Bioinformatics ◽

10.2174/1574893611308010003 ◽

2013 ◽

Vol 8 (1) ◽

pp. 3-8 ◽

Cited By ~ 1

Author(s):

Weiqiang Zhou ◽

Hong Yan ◽

Xiaodan Fan ◽

Quan Hao

Keyword(s):

Support Vector Machine ◽

Protein Interactions ◽

Support Vector ◽

Protein Protein Interactions

Download Full-text

INFERRING PROTEIN-PROTEIN INTERACTIONS FROM MESSENGER RNA EXPRESSION PROFILES WITH SVM

Journal of Biological System ◽

10.1142/s0218339005001525 ◽

2005 ◽

Vol 13 (03) ◽

pp. 287-298 ◽

Cited By ~ 1

Author(s):

JUN CAI ◽

YING HUANG ◽

LIANG JI ◽

YANDA LI

Keyword(s):

High Throughput ◽

Protein Interactions ◽

Messenger Rna ◽

Expression Profiles ◽

Support Vector ◽

Svm Classifier ◽

Good Prediction ◽

Protein Protein Interactions ◽

Protein Protein Interaction ◽

High Throughput Experiments

In post-genomic biology, researchers in the field of proteome focus their attention on the networks of protein interactions that control the lives of cells and organisms. Protein-protein interactions play a useful role in dynamic cellular machinery. In this paper, we developed a method to infer protein-protein interactions based on the theory of support vector machine (SVM). For a given pair of proteins, a new strategy of calculating cross-correlation function of mRNA expression profiles was used to encode SVM vectors. We compared the performance with other methods of inferring protein-protein interaction. Results suggested that, through five-fold cross validation, our SVM model achieved a good prediction. It enables us to show that expression profiles in transcription level can be used to distinguish physical or functional interactions of proteins as well as sequence contents. Lastly, we applied our SVM classifier to evaluate data quality of interaction data sets from four high-throughput experiments. The results show that high-throughput experiments sacrifice some accuracy in determination of interactions because of limitation of experiment technologies.

Download Full-text

Prediction of Protein-Protein Interactions between HIV-1 and Human using Support Vector Machine Combined with Multivariate Mutual Information

2020 3rd International Conference on Biomedical Engineering (IBIOMED) ◽

10.1109/ibiomed50285.2020.9487598 ◽

2020 ◽

Author(s):

Mohamad Irlin Sunggawa ◽

Alhadi Bustamam ◽

Devvi Sarwinda ◽

Patuan Pangihutan Tampubolon ◽

Wibowo Mangunwardoyo

Keyword(s):

Support Vector Machine ◽

Mutual Information ◽

Protein Interactions ◽

Support Vector ◽

Protein Protein Interactions ◽

Multivariate Mutual Information ◽

Hiv 1

Download Full-text

Using discriminative vector machine model with 2DPCA to predict interactions among proteins

BMC Bioinformatics ◽

10.1186/s12859-019-3268-5 ◽

2019 ◽

Vol 20 (S25) ◽

Cited By ~ 1

Author(s):

Zhengwei Li ◽

Ru Nie ◽

Zhuhong You ◽

Chen Cao ◽

Jiashu Li

Keyword(s):

Protein Interactions ◽

False Positive Rate ◽

Principal Component ◽

Amino Acid Sequences ◽

Support Vector ◽

Machine Model ◽

Discriminative Feature ◽

Benchmark Datasets ◽

Low Efficiency ◽

H Pylori

Abstract Background The interactions among proteins act as crucial roles in most cellular processes. Despite enormous effort put for identifying protein-protein interactions (PPIs) from a large number of organisms, existing firsthand biological experimental methods are high cost, low efficiency, and high false-positive rate. The application of in silico methods opens new doors for predicting interactions among proteins, and has been attracted a great deal of attention in the last decades. Results Here we present a novelty computational model with the adoption of our proposed Discriminative Vector Machine (DVM) model and a 2-Dimensional Principal Component Analysis (2DPCA) descriptor to identify candidate PPIs only based on protein sequences. To be more specific, a 2DPCA descriptor is employed to capture discriminative feature information from Position-Specific Scoring Matrix (PSSM) of amino acid sequences by the tool of PSI-BLAST. Then, a robust and powerful DVM classifier is employed to infer PPIs. When applied on both gold benchmark datasets of Yeast and H. pylori, our model obtained mean prediction accuracies as high as of 97.06 and 92.89%, respectively, which demonstrates a noticeable improvement than some state-of-the-art methods. Moreover, we constructed Support Vector Machines (SVM) based predictive model and made comparison it with our model on Human benchmark dataset. In addition, to further demonstrate the predictive reliability of our proposed method, we also carried out extensive experiments for identifying cross-species PPIs on five other species datasets. Conclusions All the experimental results indicate that our method is very effective for identifying potential PPIs and could serve as a practical approach to aid bioexperiment in proteomics research.

Download Full-text

Bcl-2 and FKBP12 bind to IP3 and ryanodine receptors at overlapping sites: the complexity of protein–protein interactions for channel regulation

Biochemical Society Transactions ◽

10.1042/bst20140298 ◽

2015 ◽

Vol 43 (3) ◽

pp. 396-404 ◽

Cited By ~ 10

Author(s):

Tim Vervliet ◽

Jan B. Parys ◽

Geert Bultynck

Keyword(s):

Protein Interactions ◽

Hippocampal Neurons ◽

Binding Protein ◽

Ryanodine Receptors ◽

B Cell Lymphoma ◽

Experimental Models ◽

Amino Acid Sequences ◽

Protein Protein Interactions ◽

Fk506 Binding Protein ◽

Fk506 Binding Proteins

The 12- and 12.6-kDa FK506-binding proteins, FKBP12 (12-kDa FK506-binding protein) and FKBP12.6 (12.6-kDa FK506-binding protein), have been implicated in the binding to and the regulation of ryanodine receptors (RyRs) and inositol 1,4,5-trisphosphate receptors (IP3Rs), both tetrameric intracellular Ca2+-release channels. Whereas the amino acid sequences responsible for FKBP12 binding to RyRs are conserved in IP3Rs, FKBP12 binding to IP3Rs has been questioned and could not be observed in various experimental models. Nevertheless, conservation of these residues in the different IP3R isoforms and during evolution suggested that they could harbour an important regulatory site critical for IP3R-channel function. Recently, it has become clear that in IP3Rs, this site was targeted by B-cell lymphoma 2 (Bcl-2) via its Bcl-2 homology (BH)4 domain, thereby dampening IP3R-mediated Ca2+ flux and preventing pro-apoptotic Ca2+ signalling. Furthermore, vice versa, the presence of the corresponding site in RyRs implied that Bcl-2 proteins could associate with and regulate RyR channels. Recently, the existence of endogenous RyR–Bcl-2 complexes has been identified in primary hippocampal neurons. Like for IP3Rs, binding of Bcl-2 to RyRs also involved its BH4 domain and suppressed RyR-mediated Ca2+ release. We therefore propose that the originally identified FKBP12-binding site in IP3Rs is a region critical for controlling IP3R-mediated Ca2+ flux by recruiting Bcl-2 rather than FKBP12. Although we hypothesize that anti-apoptotic Bcl-2 proteins, but not FKBP12, are the main physiological inhibitors of IP3Rs, we cannot exclude that Bcl-2 could help engaging FKBP12 (or other FKBP isoforms) to the IP3R, potentially via calcineurin.

Download Full-text