scholarly journals Prediction of Enzyme Mutant Activity Using Computational Mutagenesis and Incremental Transduction

2011 ◽  
Vol 2011 ◽  
pp. 1-9 ◽  
Author(s):  
Nada Basit ◽  
Harry Wechsler

Wet laboratory mutagenesis to determine enzyme activity changes is expensive and time consuming. This paper expands on standard one-shot learning by proposing an incremental transductive method (T2bRF) for the prediction of enzyme mutant activity during mutagenesis using Delaunay tessellation and 4-body statistical potentials for representation. Incremental learning is in tune with both eScience and actual experimentation, as it accounts for cumulative annotation effects of enzyme mutant activity over time. The experimental results reported, using cross-validation, show that overall the incremental transductive method proposed, using random forest as base classifier, yields better results compared to one-shot learning methods. T2bRF is shown to yield 90% on T4 and LAC (and 86% on HIV-1). This is significantly better than state-of-the-art competing methods, whose performance yield is at 80% or less using the same datasets.

Author(s):  
Sri Hartini ◽  
Zuherman Rustam ◽  
Glori Stephani Saragih ◽  
María Jesús Segovia Vargas

<span id="docs-internal-guid-4935b5ce-7fff-d9fa-75c7-0c6a5aa1f9a6"><span>Banks have a crucial role in the financial system. When many banks suffer from the crisis, it can lead to financial instability. According to the impact of the crises, the banking crisis can be divided into two categories, namely systemic and non-systemic crisis. When systemic crises happen, it may cause even stable banks bankrupt. Hence, this paper proposed a random forest for estimating the probability of banking crises as prevention action. Random forest is well-known as a robust technique both in classification and regression, which is far from the intervention of outliers and overfitting. The experiments were then constructed using the financial crisis database, containing a sample of 79 countries in the period 1981-1999 (annual data). This dataset has 521 samples consisting of 164 crisis samples and 357 non-crisis cases. From the experiments, it was concluded that utilizing 90 percent of training data would deliver 0.98 accuracy, 0.92 sensitivity, 1.00 precision, and 0.96 F1-Score as the highest score than other percentages of training data. These results are also better than state-of-the-art methods used in the same dataset. Therefore, the proposed method is shown promising results to predict the probability of banking crises.</span></span>


Author(s):  
Peng Liu ◽  
Yijie Ding ◽  
Ying Rong ◽  
Dong Chen

Cell penetrating peptides (CPPs) are short peptides that can carry biomolecules of varying sizes across the cell membrane into the cytoplasm. Correctly identifying CPPs is the basis for studying their functions and mechanisms. Here, we propose a novel CPP predictor that is able to predict CPPs and their uptake efficiency. In our method, five feature descriptors are applied to encode the sequence and compose a hybrid feature vector. Afterward, the wrapper + random forest algorithm is employed, which combines feature selection with the prediction process to find features that are crucial for identifying CPPs. The jackknife cross validation result shows that our predictor is comparable to state-of-the-art CPP predictors, and our method reduces the feature dimension, which improves computational efficiency and avoids overfitting, allowing our predictor to be adopted to identify large-scale CPP data.


Author(s):  
Sandra E. Sinisi ◽  
Eric C Polley ◽  
Maya L Petersen ◽  
Soo-Yon Rhee ◽  
Mark J. van der Laan

Many alternative data-adaptive algorithms can be used to learn a predictor based on observed data. Examples of such learners include decision trees, neural networks, support vector regression, least angle regression, logic regression, and the Deletion/Substitution/Addition algorithm. The optimal learner for prediction will vary depending on the underlying data-generating distribution. In this article we introduce the "super learner", a prediction algorithm that applies any set of candidate learners and uses cross-validation to select between them. Theory shows that asymptotically the super learner performs essentially as well as or better than any of the candidate learners. In this article we present the theory behind the super learner, and illustrate its performance using simulations. We further apply the super learner to a data example, in which we predict the phenotypic antiretroviral susceptibility of HIV based on viral genotype. Specifically, we apply the super learner to predict susceptibility to a specific protease inhibitor, nelfinavir, using a set of database-derived non-polymorphic treatment-selected mutations.


2020 ◽  
Vol 10 (23) ◽  
pp. 8346
Author(s):  
Ni Jiang ◽  
Feihong Yu

Cell counting is a fundamental part of biomedical and pathological research. Predicting a density map is the mainstream method to count cells. As an easy-trained and well-generalized model, the random forest is often used to learn the cell images and predict the density maps. However, it cannot predict the data that are beyond the training data, which may result in underestimation. To overcome this problem, we propose a cell counting framework to predict the density map by detecting cells. The cell counting framework contains two parts: the training data preparation and the detection framework. The former makes sure that the cells can be detected even when overlapping, and the latter makes sure the count result accurate and robust. The proposed method uses multiple random forests to predict various probability maps where the cells can be detected by Hessian matrix. Take all the detection results into consideration to get the density map and achieve better performance. We conducted experiments on three public cell datasets. Experimental results showed that the proposed model performs better than the traditional random forest (RF) in terms of accuracy and robustness, and even superior to some state-of-the-art deep learning models. Especially when the training data are small, which is the usual case in cell counting, the count errors on VGG cells, and MBM cells were decreased from 3.4 to 2.9, from 11.3 to 9.3, respectively. The proposed model can obtain the lowest count error and achieves state-of-the-art.


2021 ◽  
Vol 17 (12) ◽  
pp. e1009655
Author(s):  
Lei Li ◽  
Yu-Tian Wang ◽  
Cun-Mei Ji ◽  
Chun-Hou Zheng ◽  
Jian-Cheng Ni ◽  
...  

microRNAs (miRNAs) are small non-coding RNAs related to a number of complicated biological processes. A growing body of studies have suggested that miRNAs are closely associated with many human diseases. It is meaningful to consider disease-related miRNAs as potential biomarkers, which could greatly contribute to understanding the mechanisms of complex diseases and benefit the prevention, detection, diagnosis and treatment of extraordinary diseases. In this study, we presented a novel model named Graph Convolutional Autoencoder for miRNA-Disease Association Prediction (GCAEMDA). In the proposed model, we utilized miRNA-miRNA similarities, disease-disease similarities and verified miRNA-disease associations to construct a heterogeneous network, which is applied to learn the embeddings of miRNAs and diseases. In addition, we separately constructed miRNA-based and disease-based sub-networks. Combining the embeddings of miRNAs and diseases, graph convolution autoencoder (GCAE) is utilized to calculate association scores of miRNA-disease on two sub-networks, respectively. Furthermore, we obtained final prediction scores between miRNAs and diseases by adopting an average ensemble way to integrate the prediction scores from two types of subnetworks. To indicate the accuracy of GCAEMDA, we applied different cross validation methods to evaluate our model whose performance were better than the state-of-the-art models. Case studies on a common human diseases were also implemented to prove the effectiveness of GCAEMDA. The results demonstrated that GCAEMDA were beneficial to infer potential associations of miRNA-disease.


2021 ◽  
Vol 21 (1) ◽  
pp. 23-33
Author(s):  
Oscar Oscar ◽  
Nurlaelatul Maulidah ◽  
Annida Purnamawati ◽  
Destiana Putri ◽  
Hilman F Pardede

Telemarketing is one effective way for promoting products. However, it is often difficult to measure the success of telemarketing. Therefore, a way to predict the success rate of telemarketing, and hence strategies could be planned to increase the success rate. In this study, we evaluate several implementations of machine learning for prediction the success of telemarketing. The evaluated methods are Deep Neural Network (DNN), Random Forest, and K-nearest neighbor (K-NN). We validate our experiments using 10-fold cross validation and our experiments show that DNN with 3 hidden layers outperforms other methods. Accuracy of 90% is achieved with the DNN. It is better than Random Forest and KNN that achieve accuracies of algorithm and 88% and 89%.Keywords— Bank Marketing, DNN, KNN, Random Forest.


2001 ◽  
Vol 13 (12) ◽  
pp. 2865-2877 ◽  
Author(s):  
Rudy Setiono

This article presents an algorithm that constructs feedforward neural networks with a single hidden layer for pattern classification. The algorithm starts with a small number of hidden units in the network and adds more hidden units as needed to improve the network's predictive accuracy. To determine when to stop adding new hidden units, the algorithm makes use of a subset of the available training samples for cross validation. New hidden units are added to the network only if they improve the classification accuracy of the network on the training samples and on the cross-validation samples. Extensive experimental results show that the algorithm is effective in obtaining networks with predictive accuracy rates that are better than those obtained by state-of-the-art decision tree methods.


Author(s):  
K. Rahmani ◽  
H. Huang ◽  
H. Mayer

In this paper we present a bottom-up approach for the semantic segmentation of building facades. Facades have a predefined topology, contain specific objects such as doors and windows and follow architectural rules. Our goal is to create homogeneous segments for facade objects. To this end, we have created a pixelwise labeling method using a Structured Random Forest. According to the evaluation of results for two datasets with the classifier we have achieved the above goal producing a nearly noise-free labeling image and perform on par or even slightly better than the classifier-only stages of state-of-the-art approaches. This is due to the encoding of the local topological structure of the facade objects in the Structured Random Forest. Additionally, we have employed an iterative optimization approach to select the best possible labeling.


2020 ◽  
Vol 27 (3) ◽  
pp. 178-186 ◽  
Author(s):  
Ganesan Pugalenthi ◽  
Varadharaju Nithya ◽  
Kuo-Chen Chou ◽  
Govindaraju Archunan

Background: N-Glycosylation is one of the most important post-translational mechanisms in eukaryotes. N-glycosylation predominantly occurs in N-X-[S/T] sequon where X is any amino acid other than proline. However, not all N-X-[S/T] sequons in proteins are glycosylated. Therefore, accurate prediction of N-glycosylation sites is essential to understand Nglycosylation mechanism. Objective: In this article, our motivation is to develop a computational method to predict Nglycosylation sites in eukaryotic protein sequences. Methods: In this article, we report a random forest method, Nglyc, to predict N-glycosylation site from protein sequence, using 315 sequence features. The method was trained using a dataset of 600 N-glycosylation sites and 600 non-glycosylation sites and tested on the dataset containing 295 Nglycosylation sites and 253 non-glycosylation sites. Nglyc prediction was compared with NetNGlyc, EnsembleGly and GPP methods. Further, the performance of Nglyc was evaluated using human and mouse N-glycosylation sites. Results: Nglyc method achieved an overall training accuracy of 0.8033 with all 315 features. Performance comparison with NetNGlyc, EnsembleGly and GPP methods shows that Nglyc performs better than the other methods with high sensitivity and specificity rate. Conclusion: Our method achieved an overall accuracy of 0.8248 with 0.8305 sensitivity and 0.8182 specificity. Comparison study shows that our method performs better than the other methods. Applicability and success of our method was further evaluated using human and mouse N-glycosylation sites. Nglyc method is freely available at https://github.com/bioinformaticsML/ Ngly.


2017 ◽  
Vol 15 (1) ◽  
pp. 31-37 ◽  
Author(s):  
Tingting Li ◽  
Binlian Sun ◽  
Yanyan Jiang ◽  
Haiyan Zeng ◽  
Yanpeng Li ◽  
...  
Keyword(s):  
Hiv 1 ◽  

Sign in / Sign up

Export Citation Format

Share Document