scholarly journals Protein Remote Homology Detection Based on an Ensemble Learning Approach

2016 ◽  
Vol 2016 ◽  
pp. 1-11 ◽  
Author(s):  
Junjie Chen ◽  
Bingquan Liu ◽  
Dong Huang

Protein remote homology detection is one of the central problems in bioinformatics. Although some computational methods have been proposed, the problem is still far from being solved. In this paper, an ensemble classifier for protein remote homology detection, called SVM-Ensemble, was proposed with a weighted voting strategy. SVM-Ensemble combined three basic classifiers based on different feature spaces, including Kmer, ACC, and SC-PseAAC. These features consider the characteristics of proteins from various perspectives, incorporating both the sequence composition and the sequence-order information along the protein sequences. Experimental results on a widely used benchmark dataset showed that the proposed SVM-Ensemble can obviously improve the predictive performance for the protein remote homology detection. Moreover, it achieved the best performance and outperformed other state-of-the-art methods.

Author(s):  
N. Srinivasan ◽  
G. Agarwal ◽  
R. M. Bhaskara ◽  
R. Gadkari ◽  
O. Krishnadev ◽  
...  

In the post-genomic era, biological databases are growing at a tremendous rate. Despite rapid accumulation of biological information, functions and other biological properties of many putative gene products of various organisms remain either unknown or obscure. This paper examines how strategic integration of large biological databases and combinations of various biological information helps address some of the fundamental questions on protein structure, function and interactions. New developments in function recognition by remote homology detection and strategic use of sequence databases aid recognition of functions of newly discovered proteins. Knowledge of 3-D structures and combined use of sequences and 3-D structures of homologous protein domains expands the ability of remote homology detection enormously. The authors also demonstrate how combined consideration of functions of individual domains of multi-domain proteins helps in recognizing gross biological attributes. This paper also discusses a few cases of combining disparate biological datasets or combination of disparate biological information in obtaining new insights about protein-protein interactions across a host and a pathogen. Finally, the authors discuss how combinations of low resolution structural data, obtained using cryoEM studies, of gigantic multi-component assemblies, and atomic level 3-D structures of the components is effective in inferring finer features in the assembly.


Entropy ◽  
2020 ◽  
Vol 22 (10) ◽  
pp. 1143
Author(s):  
Zhenwu Wang ◽  
Tielin Wang ◽  
Benting Wan ◽  
Mengjie Han

Multi-label classification (MLC) is a supervised learning problem where an object is naturally associated with multiple concepts because it can be described from various dimensions. How to exploit the resulting label correlations is the key issue in MLC problems. The classifier chain (CC) is a well-known MLC approach that can learn complex coupling relationships between labels. CC suffers from two obvious drawbacks: (1) label ordering is decided at random although it usually has a strong effect on predictive performance; (2) all the labels are inserted into the chain, although some of them may carry irrelevant information that discriminates against the others. In this work, we propose a partial classifier chain method with feature selection (PCC-FS) that exploits the label correlation between label and feature spaces and thus solves the two previously mentioned problems simultaneously. In the PCC-FS algorithm, feature selection is performed by learning the covariance between feature set and label set, thus eliminating the irrelevant features that can diminish classification performance. Couplings in the label set are extracted, and the coupled labels of each label are inserted simultaneously into the chain structure to execute the training and prediction activities. The experimental results from five metrics demonstrate that, in comparison to eight state-of-the-art MLC algorithms, the proposed method is a significant improvement on existing multi-label classification.


Sign in / Sign up

Export Citation Format

Share Document