scholarly journals Ensemble of Multiple Classifiers for Multilabel Classification of Plant Protein Subcellular Localization

Life ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 293
Author(s):  
Warin Wattanapornprom ◽  
Chinae Thammarongtham ◽  
Apiradee Hongsthong ◽  
Supatcha Lertampaiporn

The accurate prediction of protein localization is a critical step in any functional genome annotation process. This paper proposes an improved strategy for protein subcellular localization prediction in plants based on multiple classifiers, to improve prediction results in terms of both accuracy and reliability. The prediction of plant protein subcellular localization is challenging because the underlying problem is not only a multiclass, but also a multilabel problem. Generally, plant proteins can be found in 10–14 locations/compartments. The number of proteins in some compartments (nucleus, cytoplasm, and mitochondria) is generally much greater than that in other compartments (vacuole, peroxisome, Golgi, and cell wall). Therefore, the problem of imbalanced data usually arises. Therefore, we propose an ensemble machine learning method based on average voting among heterogeneous classifiers. We first extracted various types of features suitable for each type of protein localization to form a total of 479 feature spaces. Then, feature selection methods were used to reduce the dimensions of the features into smaller informative feature subsets. This reduced feature subset was then used to train/build three different individual models. In the process of combining the three distinct classifier models, we used an average voting approach to combine the results of these three different classifiers that we constructed to return the final probability prediction. The method could predict subcellular localizations in both single- and multilabel locations, based on the voting probability. Experimental results indicated that the proposed ensemble method could achieve correct classification with an overall accuracy of 84.58% for 11 compartments, on the basis of the testing dataset.

2020 ◽  
Author(s):  
Qi Zhang ◽  
Shan Li ◽  
Bin Yu ◽  
Yang Li ◽  
Yandan Zhang ◽  
...  

ABSTRACTProteins play a significant part in life processes such as cell growth, development, and reproduction. Exploring protein subcellular localization (SCL) is a direct way to better understand the function of proteins in cells. Studies have found that more and more proteins belong to multiple subcellular locations, and these proteins are called multi-label proteins. They not only play a key role in cell life activities, but also play an indispensable role in medicine and drug development. This article first presents a new prediction model, MpsLDA-ProSVM, to predict the SCL of multi-label proteins. Firstly, the physical and chemical information, evolution information, sequence information and annotation information of protein sequences are fused. Then, for the first time, use a weighted multi-label linear discriminant analysis framework based on entropy weight form (wMLDAe) to refine and purify features, reduce the difficulty of learning. Finally, input the optimal feature subset into the multi-label learning with label-specific features (LIFT) and multi-label k-nearest neighbor (ML-KNN) algorithms to obtain a synthetic ranking of relevant labels, and then use Prediction and Relevance Ordering based SVM (ProSVM) classifier to predict the SCLs. This method can rank and classify related tags at the same time, which greatly improves the efficiency of the model. Tested by jackknife method, the overall actual accuracy (OAA) on virus, plant, Gram-positive bacteria and Gram-negative bacteria datasets are 98.06%, 98.97%, 99.81% and 98.49%, which are 0.56%-9.16%, 5.37%-30.87%, 3.51%-6.91% and 3.99%-8.59% higher than other advanced methods respectively. The source codes and datasets are available at https://github.com/QUST-AIBBDRC/MpsLDA-ProSVM/.


2020 ◽  
Vol 19 (7) ◽  
pp. 1076-1087 ◽  
Author(s):  
Georg H. H. Borner

Protein subcellular localization is an essential and highly regulated determinant of protein function. Major advances in mass spectrometry and imaging have allowed the development of powerful spatial proteomics approaches for determining protein localization at the whole cell scale. Here, a brief overview of current methods is presented, followed by a detailed discussion of organellar mapping through proteomic profiling. This relatively simple yet flexible approach is rapidly gaining popularity, because of its ability to capture the localizations of thousands of proteins in a single experiment. It can be used to generate high-resolution cell maps, and as a tool for monitoring protein localization dynamics. This review highlights the strengths and limitations of the approach and provides guidance to designing and interpreting profiling experiments.


2013 ◽  
Vol 41 (W1) ◽  
pp. W441-W447 ◽  
Author(s):  
Shengnan Tang ◽  
Tonghua Li ◽  
Peisheng Cong ◽  
Wenwei Xiong ◽  
Zhiheng Wang ◽  
...  

2019 ◽  
pp. 311-327
Author(s):  
Hanhan Cong ◽  
Hong Liu ◽  
Yuehui Chen ◽  
Yaou Zhao ◽  
Lei Wang

Each part of internal structure of cells which is commonly mentioned as subcellular is highly ordered and interconnected has unique functions. The experiments show that deviated protein delivery to the corresponding subcellular causes of human disease. Studies of protein localization can clarify pathogenesis and find treatments. As protein subcellular localization has a very important position in the field of biology, the research in this area is extremely active. Most of the existing protein sub cellular localization methods are more suitable for single-site sub cellular localization. This paper proposed an algorithm based deep convolution neural network which is suit for multi-site protein subcellular localization and the algorithm is implemented on the human protein database to verify and analyze the performance. In order to further improve the classification result of the algorithm, it was combined ensemble learning and features fusion. It can be inferred from experiments that the proposed algorithm is effective in multi-site protein subcellular localization and the overall correct rate of classification is 59.13% which is higher than SAE, SVM and RF. The algorithm proposed in this paper is more uniform and less affected by the number of samples. When the data samples are different, the classification results will have a certain impact, but the overall classification is good. Besides ensemble learning and features fusion are effective for improving classification result.


Plasmid ◽  
2019 ◽  
Vol 105 ◽  
pp. 102436 ◽  
Author(s):  
François Berthold ◽  
David Roujol ◽  
Caroline Hemmer ◽  
Elisabeth Jamet ◽  
Christophe Ritzenthaler ◽  
...  

Life ◽  
2020 ◽  
Vol 10 (12) ◽  
pp. 347
Author(s):  
Ravindra Kumar ◽  
Sandeep Kumar Dhanda

Proteins are made up of long chain of amino acids that perform a variety of functions in different organisms. The activity of the proteins is determined by the nucleotide sequence of their genes and by its 3D structure. In addition, it is essential for proteins to be destined to their specific locations or compartments to perform their structure and functions. The challenge of computational prediction of subcellular localization of proteins is addressed in various in silico methods. In this review, we reviewed the progress in this field and offered a bird eye view consisting of a comprehensive listing of tools, types of input features explored, machine learning approaches employed, and evaluation matrices applied. We hope the review will be useful for the researchers working in the field of protein localization predictions.


Sign in / Sign up

Export Citation Format

Share Document