Data Mining Methods for a Systematics of Protein Subcellular Location

THE MODEL FOR THE ANALYSIS OF PHARMACOEPIDEMIOLOGICAL DATA BASED ON DATA MINING METHODS

The Journal of scientific articles Health and Education millennium ◽

10.26787/nydha-2226-7425-2017-19-10-289-298 ◽

2017 ◽

Vol 19 (10) ◽

pp. 289-298

Author(s):

I.M. Burykin ◽

◽

G.N. Aleeva ◽

R.Kh. Khafizianova ◽

◽

...

Keyword(s):

Data Mining ◽

Mining Methods

Download Full-text

Advances in the Prediction of Protein Subcellular Locations with Machine Learning

Current Bioinformatics ◽

10.2174/1574893614666181217145156 ◽

2019 ◽

Vol 14 (5) ◽

pp. 406-421 ◽

Cited By ~ 3

Author(s):

Ting-He Zhang ◽

Shao-Wu Zhang

Keyword(s):

Machine Learning ◽

Feature Fusion ◽

Protein Sequences ◽

Subcellular Location ◽

Automated Analysis ◽

Cellular Level ◽

Machine Learning Algorithms ◽

Feature Representation ◽

Protein Subcellular Location ◽

Protein Subcellular Locations

Background: Revealing the subcellular location of a newly discovered protein can bring insight into their function and guide research at the cellular level. The experimental methods currently used to identify the protein subcellular locations are both time-consuming and expensive. Thus, it is highly desired to develop computational methods for efficiently and effectively identifying the protein subcellular locations. Especially, the rapidly increasing number of protein sequences entering the genome databases has called for the development of automated analysis methods. Methods: In this review, we will describe the recent advances in predicting the protein subcellular locations with machine learning from the following aspects: i) Protein subcellular location benchmark dataset construction, ii) Protein feature representation and feature descriptors, iii) Common machine learning algorithms, iv) Cross-validation test methods and assessment metrics, v) Web servers. Result & Conclusion: Concomitant with a large number of protein sequences generated by highthroughput technologies, four future directions for predicting protein subcellular locations with machine learning should be paid attention. One direction is the selection of novel and effective features (e.g., statistics, physical-chemical, evolutional) from the sequences and structures of proteins. Another is the feature fusion strategy. The third is the design of a powerful predictor and the fourth one is the protein multiple location sites prediction.

Download Full-text

Integrating Second-order Moving Average and Over-sampling Algorithm to Predict Apoptosis Protein Subcellular Localization

Current Bioinformatics ◽

10.2174/1574893614666190902155811 ◽

2020 ◽

Vol 15 (6) ◽

pp. 517-527

Author(s):

Yunyun Liang ◽

Shengli Zhang

Keyword(s):

Subcellular Localization ◽

Moving Average ◽

Subcellular Location ◽

Second Order ◽

Test Method ◽

Support Vector ◽

Protein Subcellular Localization ◽

Protein Subcellular Location ◽

Apoptosis Protein ◽

Leibler Divergence

Background: Apoptosis proteins have a key role in the development and the homeostasis of the organism, and are very important to understand the mechanism of cell proliferation and death. The function of apoptosis protein is closely related to its subcellular location. Objective: Prediction of apoptosis protein subcellular localization is a meaningful task. Methods: In this study, we predict the apoptosis protein subcellular location by using the PSSMbased second-order moving average descriptor, nonnegative matrix factorization based on Kullback-Leibler divergence and over-sampling algorithms. This model is named by SOMAPKLNMF- OS and constructed on the ZD98, ZW225 and CL317 benchmark datasets. Then, the support vector machine is adopted as the classifier, and the bias-free jackknife test method is used to evaluate the accuracy. Results: Our prediction system achieves the favorable and promising performance of the overall accuracy on the three datasets and also outperforms the other listed models. Conclusion: The results show that our model offers a high throughput tool for the identification of apoptosis protein subcellular localization.

Download Full-text

The Conservation of Low Complexity Regions in Bacterial Proteins Depends on the Pathogenicity of the Strain and Subcellular Location of the Protein

Genes ◽

10.3390/genes12030451 ◽

2021 ◽

Vol 12 (3) ◽

pp. 451

Author(s):

Pablo Mier ◽

Miguel A. Andrade-Navarro

Keyword(s):

Membrane Proteins ◽

Outer Membrane ◽

Bacterial Species ◽

Outer Membrane Proteins ◽

Subcellular Location ◽

Low Complexity ◽

Extracellular Proteins ◽

Bacterial Strains ◽

Bacterial Proteins ◽

Protein Subcellular Location

Low complexity regions (LCRs) in proteins are characterized by amino acid frequencies that differ from the average. These regions evolve faster and tend to be less conserved between homologs than globular domains. They are not common in bacteria, as compared to their prevalence in eukaryotes. Studying their conservation could help provide hypotheses about their function. To obtain the appropriate evolutionary focus for this rapidly evolving feature, here we study the conservation of LCRs in bacterial strains and compare their high variability to the closeness of the strains. For this, we selected 20 taxonomically diverse bacterial species and obtained the completely sequenced proteomes of two strains per species. We calculated all orthologous pairs for each of the 20 strain pairs. Per orthologous pair, we computed the conservation of two types of LCRs: compositionally biased regions (CBRs) and homorepeats (polyX). Our results show that, in bacteria, Q-rich CBRs are the most conserved, while A-rich CBRs and polyA are the most variable. LCRs have generally higher conservation when comparing pathogenic strains. However, this result depends on protein subcellular location: LCRs accumulate in extracellular and outer membrane proteins, with conservation increased in the extracellular proteins of pathogens, and decreased for polyX in the outer membrane proteins of pathogens. We conclude that these dependencies support the functional importance of LCRs in host–pathogen interactions.

Download Full-text

Research on diagnostic strategy for faults in VRF air conditioning system using hybrid data mining methods

Energy and Buildings ◽

10.1016/j.enbuild.2021.111144 ◽

2021 ◽

pp. 111144

Author(s):

Yuzhou Wang ◽

Zhengfei Li ◽

Huanxin Chen ◽

Jianxin Zhang ◽

Qian Liu ◽

...

Keyword(s):

Data Mining ◽

Air Conditioning ◽

Diagnostic Strategy ◽

Air Conditioning System ◽

Hybrid Data ◽

Mining Methods

Download Full-text

A Study of Data Mining Methods for Identification Undernutrition and Overnutrition in Obesity

Proceedings of the 2019 3rd International Conference on Software and e-Business ◽

10.1145/3374549.3374565 ◽

2019 ◽

Author(s):

Tamara Michelle Mulyono ◽

Friska Natalia ◽

Sud Sudirman

Keyword(s):

Data Mining ◽

Mining Methods

Download Full-text

IEEE Access Special Section Editorial: Advanced Data Mining Methods for Social Computing

IEEE Access ◽

10.1109/access.2020.3043060 ◽

2020 ◽

Vol 8 ◽

pp. 228598-228604

Author(s):

Yongqiang Zhao ◽

Shirui Pan ◽

Jia Wu ◽

Huaiyu Wan ◽

Huizhi Liang ◽

...

Keyword(s):

Data Mining ◽

Social Computing ◽

Special Section ◽

Mining Methods

Download Full-text

Using Data Mining Methods to Detect Medical Fraud

Proceedings of the 2020 International Conference on Management of e-Commerce and e-Government ◽

10.1145/3409891.3409902 ◽

2020 ◽

Author(s):

Long-Sheng Chen ◽

Jia-Chuan Chen

Keyword(s):

Data Mining ◽

Mining Methods ◽

Using Data

Download Full-text

Comparison of The Classification Data Mining Methods to Identify Civil Servants in Indonesian Social Insurance Company

2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI) ◽

10.1109/isriti51436.2020.9315444 ◽

2020 ◽

Author(s):

Adityan Iguh Sasmito ◽

Yova Ruldeviyani

Keyword(s):

Data Mining ◽

Social Insurance ◽

Civil Servants ◽

Insurance Company ◽

Mining Methods

Download Full-text

A Holistic Approach for Quality Oriented Maintenance Planning Supported by Data Mining Methods

Procedia CIRP ◽

10.1016/j.procir.2016.11.045 ◽

2016 ◽

Vol 57 ◽

pp. 259-264 ◽

Cited By ~ 8

Author(s):

Robert Glawar ◽

Zsolt Kemeny ◽

Tanja Nemeth ◽

Kurt Matyas ◽

Laszlo Monostori ◽

...

Keyword(s):

Data Mining ◽

Holistic Approach ◽

Maintenance Planning ◽

Mining Methods

Download Full-text