A high-precision hybrid algorithm for predicting eukaryotic protein subcellular localization

Mapping Intimacies ◽

10.1101/620179 ◽

2019 ◽

Author(s):

Dahan Zhang ◽

Haiyun Huang ◽

Xiaogang Bai ◽

Xiaodong Fang ◽

Yi Zhang

Keyword(s):

Subcellular Localization ◽

Subcellular Location ◽

Supplementary Information ◽

Local Alignment ◽

Protein Subcellular Localization ◽

Eukaryotic Protein ◽

Link Type ◽

Fisher Discriminant ◽

Average Accuracy ◽

Search Tool

ABSTRACTMotivationSubcellular location plays an essential role in protein synthesis, transport, and secretion, thus it is an important step in understanding the mechanisms of trait-related proteins. Generally, homology methods provide reliable homology-based results with small E-values. We must resort to pattern recognition algorithms (SVM, Fisher discriminant, KNN, random forest, etc.) for proteins that do not share significant homologous domains with known proteins. However, satisfying results are seldom obtained.ResultsHere, a novel hybrid method “Basic Local Alignment Search Tool+Smith-Waterman+Needleman-Wunsch” or BLAST+SWNW, has been obtained by integrating a loosened E-value Basic Local Alignment Search Tool (BLAST) with the Smith-Waterman (SW) and Needleman-Wunsch (NW) algorithms, and this method has been introduced to predict protein subcellular localization in eukaryotes. When tested on Dataset I and Dataset II, BLAST+SWNW showed an average accuracy of 97.18% and 99.60%, respectively, surpassing the performance of other algorithms in predicting eukaryotic protein subcellular localization.Availability and ImplementationBLAST+SWNW is an open source collaborative initiative available in the GitHub repository (https://github.com/ZHANGDAHAN/BLAST-SWNW-for-SLP or http://202.206.64.158:80/link/72016CAC26E4298B3B7E0EAF42288935)[email protected]; [email protected] InformationSupplementary data are available at PLOS Computational Biology online.

Download Full-text

Integrating Second-order Moving Average and Over-sampling Algorithm to Predict Apoptosis Protein Subcellular Localization

Current Bioinformatics ◽

10.2174/1574893614666190902155811 ◽

2020 ◽

Vol 15 (6) ◽

pp. 517-527

Author(s):

Yunyun Liang ◽

Shengli Zhang

Keyword(s):

Subcellular Localization ◽

Moving Average ◽

Subcellular Location ◽

Second Order ◽

Test Method ◽

Support Vector ◽

Protein Subcellular Localization ◽

Protein Subcellular Location ◽

Apoptosis Protein ◽

Leibler Divergence

Background: Apoptosis proteins have a key role in the development and the homeostasis of the organism, and are very important to understand the mechanism of cell proliferation and death. The function of apoptosis protein is closely related to its subcellular location. Objective: Prediction of apoptosis protein subcellular localization is a meaningful task. Methods: In this study, we predict the apoptosis protein subcellular location by using the PSSMbased second-order moving average descriptor, nonnegative matrix factorization based on Kullback-Leibler divergence and over-sampling algorithms. This model is named by SOMAPKLNMF- OS and constructed on the ZD98, ZW225 and CL317 benchmark datasets. Then, the support vector machine is adopted as the classifier, and the bias-free jackknife test method is used to evaluate the accuracy. Results: Our prediction system achieves the favorable and promising performance of the overall accuracy on the three datasets and also outperforms the other listed models. Conclusion: The results show that our model offers a high throughput tool for the identification of apoptosis protein subcellular localization.

Download Full-text

TSSub: eukaryotic protein subcellular localization by extracting features from profiles

Bioinformatics ◽

10.1093/bioinformatics/btl180 ◽

2006 ◽

Vol 22 (14) ◽

pp. 1784-1785 ◽

Cited By ~ 15

Author(s):

J. Guo ◽

Y. Lin

Keyword(s):

Subcellular Localization ◽

Protein Subcellular Localization ◽

Eukaryotic Protein

Download Full-text

PSORTm: a bacterial and archaeal protein subcellular localization prediction tool for metagenomics data

Bioinformatics ◽

10.1093/bioinformatics/btaa136 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3043-3048 ◽

Cited By ~ 2

Author(s):

Michael A Peabody ◽

Wing Yin Venus Lau ◽

Gemma R Hoad ◽

Baofeng Jia ◽

Finlay Maguire ◽

...

Keyword(s):

Water Quality ◽

Subcellular Localization ◽

Microbial Communities ◽

Cell Envelope ◽

Supplementary Information ◽

Protein Subcellular Localization ◽

Metagenomic Sequence ◽

Archaeal Protein ◽

Wide Range ◽

Metagenomics Data

Abstract Motivation Many methods for microbial protein subcellular localization (SCL) prediction exist; however, none is readily available for analysis of metagenomic sequence data, despite growing interest from researchers studying microbial communities in humans, agri-food relevant organisms and in other environments (e.g. for identification of cell-surface biomarkers for rapid protein-based diagnostic tests). We wished to also identify new markers of water quality from freshwater samples collected from pristine versus pollution-impacted watersheds. Results We report PSORTm, the first bioinformatics tool designed for prediction of diverse bacterial and archaeal protein SCL from metagenomics data. PSORTm incorporates components of PSORTb, one of the most precise and widely used protein SCL predictors, with an automated classification by cell envelope. An evaluation using 5-fold cross-validation with in silico-fragmented sequences with known localization showed that PSORTm maintains PSORTb’s high precision, while sensitivity increases proportionately with metagenomic sequence fragment length. PSORTm’s read-based analysis was similar to PSORTb-based analysis of metagenome-assembled genomes (MAGs); however, the latter requires non-trivial manual classification of each MAG by cell envelope, and cannot make use of unassembled sequences. Analysis of the watershed samples revealed the importance of normalization and identified potential biomarkers of water quality. This method should be useful for examining a wide range of microbial communities, including human microbiomes, and other microbiomes of medical, environmental or industrial importance. Availability and implementation Documentation, source code and docker containers are available for running PSORTm locally at https://www.psort.org/psortm/ (freely available, open-source software under GNU General Public License Version 3). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ImPLoc: a multi-instance deep learning model for the prediction of protein subcellular localization based on immunohistochemistry images

Bioinformatics ◽

10.1093/bioinformatics/btz909 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2244-2250 ◽

Cited By ~ 5

Author(s):

Wei Long ◽

Yang Yang ◽

Hong-Bin Shen

Keyword(s):

Subcellular Localization ◽

Tissue Level ◽

Image Features ◽

Supplementary Information ◽

Protein Distribution ◽

Protein Subcellular Localization ◽

Significance Level ◽

Protein Functions ◽

Human Protein Atlas ◽

Cancer Tissues

Abstract Motivation The tissue atlas of the human protein atlas (HPA) houses immunohistochemistry (IHC) images visualizing the protein distribution from the tissue level down to the cell level, which provide an important resource to study human spatial proteome. Especially, the protein subcellular localization patterns revealed by these images are helpful for understanding protein functions, and the differential localization analysis across normal and cancer tissues lead to new cancer biomarkers. However, computational tools for processing images in this database are highly underdeveloped. The recognition of the localization patterns suffers from the variation in image quality and the difficulty in detecting microscopic targets. Results We propose a deep multi-instance multi-label model, ImPLoc, to predict the subcellular locations from IHC images. In this model, we employ a deep convolutional neural network-based feature extractor to represent image features, and design a multi-head self-attention encoder to aggregate multiple feature vectors for subsequent prediction. We construct a benchmark dataset of 1186 proteins including 7855 images from HPA and 6 subcellular locations. The experimental results show that ImPLoc achieves significant enhancement on the prediction accuracy compared with the current computational methods. We further apply ImPLoc to a test set of 889 proteins with images from both normal and cancer tissues, and obtain 8 differentially localized proteins with a significance level of 0.05. Availability and implementation https://github.com/yl2019lw/ImPloc. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Improved Prediction of Eukaryotic Protein Subcellular Localization Using Particle Swarm Optimization of Multiple Classifiers

2017 21st International Computer Science and Engineering Conference (ICSEC) ◽

10.1109/icsec.2017.8443775 ◽

2017 ◽

Author(s):

Sirapop Nuannimnoi ◽

Supatcha Lertampaiporn ◽

Chinae Thammarongtham

Keyword(s):

Particle Swarm Optimization ◽

Subcellular Localization ◽

Particle Swarm ◽

Protein Subcellular Localization ◽

Swarm Optimization ◽

Eukaryotic Protein ◽

Multiple Classifiers

Download Full-text

Eukaryotic Protein Subcellular Localization Based on Local Pairwise Profile Alignment SVM

2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing ◽

10.1109/mlsp.2006.275581 ◽

2006 ◽

Cited By ~ 7

Author(s):

Jian Guo ◽

Man-wai Mak ◽

Sun-yuan Kung

Keyword(s):

Subcellular Localization ◽

Protein Subcellular Localization ◽

Eukaryotic Protein ◽

Profile Alignment

Download Full-text

Using motifs in the prediction of eukaryotic protein subcellular localization

2005 IEEE Engineering in Medicine and Biology 27th Annual Conference ◽

10.1109/iembs.2005.1617055 ◽

2005 ◽

Author(s):

D. Xie ◽

Ao Li ◽

Xiaojun Lin ◽

Minghui Wang ◽

Zhaohui Jiang ◽

...

Keyword(s):

Subcellular Localization ◽

Protein Subcellular Localization ◽

Eukaryotic Protein

Download Full-text

MIC_Locator: a novel image-based protein subcellular location multi-label prediction model based on multi-scale monogenic signal representation and intensity encoding strategy

BMC Bioinformatics ◽

10.1186/s12859-019-3136-3 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Fan Yang ◽

Yang Liu ◽

Yanbin Wang ◽

Zhijian Yin ◽

Zhen Yang

Keyword(s):

Subcellular Localization ◽

Prediction Model ◽

Subcellular Location ◽

Protein Subcellular Localization ◽

Monogenic Signal ◽

Protein Subcellular Location ◽

Intensity Coding ◽

Coding Strategy ◽

The Right ◽

Frequency Feature

Abstract Background Protein subcellular localization plays a crucial role in understanding cell function. Proteins need to be in the right place at the right time, and combine with the corresponding molecules to fulfill their functions. Furthermore, prediction of protein subcellular location not only should be a guiding role in drug design and development due to potential molecular targets but also be an essential role in genome annotation. Taking the current status of image-based protein subcellular localization as an example, there are three common drawbacks, i.e., obsolete datasets without updating label information, stereotypical feature descriptor on spatial domain or grey level, and single-function prediction algorithm’s limited capacity of handling single-label database. Results In this paper, a novel human protein subcellular localization prediction model MIC_Locator is proposed. Firstly, the latest datasets are collected and collated as our benchmark dataset instead of obsolete data while training prediction model. Secondly, Fourier transformation, Riesz transformation, Log-Gabor filter and intensity coding strategy are employed to obtain frequency feature based on three components of monogenic signal with different frequency scales. Thirdly, a chained prediction model is proposed to handle multi-label instead of single-label datasets. The experiment results showed that the MIC_Locator can achieve 60.56% subset accuracy and outperform the existing majority of prediction models, and the frequency feature and intensity coding strategy can be conducive to improving the classification accuracy. Conclusions Our results demonstrate that the frequency feature is more beneficial for improving the performance of model compared to features extracted from spatial domain, and the MIC_Locator proposed in this paper can speed up validation of protein annotation, knowledge of protein function and proteomics research.

Download Full-text

Protein subcellular localization based on deep image features and criterion learning strategy

Briefings in Bioinformatics ◽

10.1093/bib/bbaa313 ◽

2020 ◽

Author(s):

Ran Su ◽

Linlin He ◽

Tianling Liu ◽

Xiaofeng Liu ◽

Leyi Wei

Keyword(s):

Neural Networks ◽

Subcellular Localization ◽

Learning Strategy ◽

Subcellular Location ◽

Image Features ◽

Protein Subcellular Localization ◽

Protein Subcellular Location ◽

Protein Functions ◽

Deep Image ◽

Criterion Learning

Abstract The spatial distribution of proteome at subcellular levels provides clues for protein functions, thus is important to human biology and medicine. Imaging-based methods are one of the most important approaches for predicting protein subcellular location. Although deep neural networks have shown impressive performance in a number of imaging tasks, its application to protein subcellular localization has not been sufficiently explored. In this study, we developed a deep imaging-based approach to localize the proteins at subcellular levels. Based on deep image features extracted from convolutional neural networks (CNNs), both single-label and multi-label locations can be accurately predicted. Particularly, the multi-label prediction is quite a challenging task. Here we developed a criterion learning strategy to exploit the label–attribute relevancy and label–label relevancy. A criterion that was used to determine the final label set was automatically obtained during the learning procedure. We concluded an optimal CNN architecture that could give the best results. Besides, experiments show that compared with the hand-crafted features, the deep features present more accurate prediction with less features. The implementation for the proposed method is available at https://github.com/RanSuLab/ProteinSubcellularLocation.

Download Full-text

Automated classification of protein subcellular localization in immunohistochemistry images to reveal biomarkers in colon cancer

BMC Bioinformatics ◽

10.1186/s12859-020-03731-y ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Zhen-Zhen Xue ◽

Yanxia Wu ◽

Qing-Zu Gao ◽

Liang Zhao ◽

Ying-Ying Xu

Keyword(s):

Colon Cancer ◽

Subcellular Localization ◽

Subcellular Location ◽

Human Colon ◽

Automated Classification ◽

Protein Subcellular Localization ◽

Protein Biomarkers ◽

Image Patches ◽

Protein Subcellular Locations

Abstract Background Protein biomarkers play important roles in cancer diagnosis. Many efforts have been made on measuring abnormal expression intensity in biological samples to identity cancer types and stages. However, the change of subcellular location of proteins, which is also critical for understanding and detecting diseases, has been rarely studied. Results In this work, we developed a machine learning model to classify protein subcellular locations based on immunohistochemistry images of human colon tissues, and validated the ability of the model to detect subcellular location changes of biomarker proteins related to colon cancer. The model uses representative image patches as inputs, and integrates feature engineering and deep learning methods. It achieves 92.69% accuracy in classification of new proteins. Two validation datasets of colon cancer biomarkers derived from published literatures and the human protein atlas database respectively are employed. It turns out that 81.82 and 65.66% of the biomarker proteins can be identified to change locations. Conclusions Our results demonstrate that using image patches and combining predefined and deep features can improve the performance of protein subcellular localization, and our model can effectively detect biomarkers based on protein subcellular translocations. This study is anticipated to be useful in annotating unknown subcellular localization for proteins and discovering new potential location biomarkers.

Download Full-text