scholarly journals A Protein Classification Benchmark collection for machine learning

2007 ◽  
Vol 35 (Database) ◽  
pp. D232-D236 ◽  
Author(s):  
P. Sonego ◽  
M. Pacurar ◽  
S. Dhir ◽  
A. Kertesz-Farkas ◽  
A. Kocsor ◽  
...  

In the current era, bioinformatics has been an emerging research area in the context of protein enzyme classification from the unknown protein data. In bioinformatics, the prime goal is to manipulate the protein data and develop a computational technique to classify and predict the appropriate features for function predictions. In this context, several machine learning and statistical technique have been designed for classification of data. The classification of protein data is one the challenging task and generally the classification of protein data has been done on human protein data. In this article, we have considered rat enzyme class for classification and predictions. Here we have used like CRT, CHAID, C5.0, NEURAL, SVM, and Bayesian for classification of protein data and to measure the performance of the model, the accuracy, specificity, sensitivity, precision, recall, f-measures and MCC have been used. The experimental result highlights that the some of the protein data are imbalance that affects the performance. In this experiment, the Lyases, Isomerases and Ligases class of data are imbalanced and affect the performance of the models. The experimental results highlight that the C5.0 gives 91.5% accuracy and takes only 4 second for computation and can be used for protein classification and prediction of protein data.


Author(s):  
Chhote Lal Prasad Gupta ◽  
Anand Bihari ◽  
Sudhakar Tripathi

Background: In recent era prediction of enzyme class from an unknown protein is one of the challenging tasks in bioinformatics. Day to day the number of proteins increases that causes difficulties in clinical verification and classification; as a result, the prediction of enzyme class gives a new opportunity to bioinformatics scholars. The machine learning classification technique helps in protein classification and predictions. But it is imperative to know which classification technique is more suited for protein classification. This study used human proteins data that is extracted from UniProtKB databank. Total 4368 protein data with 45 identified features has been used for experimental analysis. Objective: The prime objective of this article is to find an appropriate classification technique to classify the reviewed as well as un-reviewed human enzyme class of protein data. Also find the significance of different features in protein classification and prediction. Method: In this article, the ten most significant classification techniques such as CRT, QUEST, CHAID, C5.0, ANN, SVM, Bayesian, Random Forest, XgBoost and CatBoost has been used to classify the data and know the importance of features. To validate the result of different classification technique, the accuracy, precision, recall, F-measures, sensitivity, specificity, MCC, ROC and AUROC has been used. All experiment has been done with the help of SPSS Clementine and Python. Result: Above discussed classification techniques give different results and found that the data are imbalanced for class C4, C5, and C6. As a result, all of the classification technique gives acceptable accuracy above of 60% for these classes of data, but their precision value is very less or negligible. The experimental results highlight that the Random forest gives highest accuracy as well as AUROC among all, i.e., 96.84% and 0.945 respectively. And also have high precision and recall value. Conclusion: The experiment conducted and analyzed in this article highlight that the Random Forest classification technique can be used for protein of human enzyme classification and predictions.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Apiwat Sangphukieo ◽  
Teeraphan Laomettachit ◽  
Marasri Ruengjitchatchawalya

2021 ◽  
Author(s):  
Edoardo Farnesi ◽  
Andrea Barucci ◽  
Cristiano D'Andrea ◽  
Martina Banchelli ◽  
Chiara Amicucci ◽  
...  

2020 ◽  
Author(s):  
Apiwat Sangphukieo ◽  
Teeraphan Laomettachit ◽  
Marasri Ruengjitchatchawalya

AbstractIdentification of novel photosynthetic proteins is important for understanding and improving photosynthetic efficiency. Synergistically, genomic context such as genome neighborhood can provide additional useful information to identify the photosynthetic proteins. We, therefore, expected that applying the computational approach, particularly machine learning (ML) with the genome neighborhood-based feature should facilitate the photosynthetic function assignment. Our results revealed a functional relationship between photosynthetic genes and their genomic neighbors, indicating the possibility to assign functions from their genome neighborhood profile. Therefore, we created a new method for extracting the patterns based on genome neighborhood network (GNN) and applied for the photosynthetic protein classification using ML algorithms. Random forest (RF) classifier using genome neighborhood-based features achieved the highest accuracy up to 94% in the classification of photosynthetic proteins and also showed better performance (Mathew’s correlation coefficient = 0.852) than other available tools including the sequence similarity search (0.497) and ML-based method (0.512). Furthermore, we demonstrated the ability of our model to identify novel photosynthetic proteins comparing to the other methods. Our classifier is available at http://bicep.kmutt.ac.th/photomod_standalone, https://bit.ly/2S0I2Ox and DockerHub: https://hub.docker.com/r/asangphukieo/photomod


2020 ◽  
Vol 43 ◽  
Author(s):  
Myrthe Faber

Abstract Gilead et al. state that abstraction supports mental travel, and that mental travel critically relies on abstraction. I propose an important addition to this theoretical framework, namely that mental travel might also support abstraction. Specifically, I argue that spontaneous mental travel (mind wandering), much like data augmentation in machine learning, provides variability in mental content and context necessary for abstraction.


2020 ◽  
Author(s):  
Mohammed J. Zaki ◽  
Wagner Meira, Jr
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document