Predicting the oligomeric states of fluorescent proteins

10.7287/peerj.preprints.922v1 ◽

2015 ◽

Author(s):

Saw Simeon ◽

Watshara Shoombuatong ◽

Likit Preeyanon ◽

Virapong Prachayasittikul ◽

Chanin Nantasenamat

Keyword(s):

Amino Acid ◽

Fluorescent Proteins ◽

Computational Prediction ◽

Amino Acid Sequences ◽

Machine Learning Algorithms ◽

Support Vector ◽

Dipeptide Composition ◽

Network Support ◽

Protein Tagging ◽

Fold Cross Validation

Currently, monomeric fluorescent proteins (FP) are ideal markers for protein tagging. The prediction of oligomeric states is helpful for enhancing live biomedical imaging. Computational prediction of FP oligomeric states can accelerate the effort of protein engineering to create monomeric FPs by saving time and money. To the best of our knowledge, this study represents the first computational model for predicting and analyzing FP oligomerization directly from their amino acid sequences. An exhaustive dataset consisting of 397 unique FP oligomeric states was compiled from the literature. FP were described by 3 classes of protein descriptors including amino acid composition, dipeptide composition and physicochemical properties. The oligomeric states of FP was predicted using decision tree (DT) algorithm and results demonstrated that DT provided robust performance with accuracies in ranges of 79.97-81.72% and 80.76-82.63% for the internal (e.g. 10-fold cross-validation) and external sets, respectively. This approach was also benchmarked with other common machine learning algorithms such as artificial neural network, support vector machine and random forest. A thorough analysis of amino acid sequence features was conducted to provide informative insights into FP oligomerization, which may aid in engineering novel monomeric fluorescent proteins. The following differentiating characteristics of monomeric and oligomeric fluorescent proteins were derived from DT: (i) substitution of any amino acid to Glu led to the reduction of aggregated proteins and (ii) oligomerization of FP appears to be stabilized by several hydrophobic contacts. Datasets and R source code are available at http://dx.doi.org/10.6084/m9.figshare.1348575.

Download Full-text

Is It Possible to Forecast the Price of Bitcoin?

Forecasting ◽

10.3390/forecast3020024 ◽

2021 ◽

Vol 3 (2) ◽

pp. 377-420

Author(s):

Julien Chevallier ◽

Dominique Guégan ◽

Stéphane Goutte

Keyword(s):

Data Analytics ◽

Daily Variation ◽

A Priori ◽

Machine Learning Algorithms ◽

Support Vector ◽

Market Growth ◽

Network Support ◽

Market Participants ◽

Nearest Neighbours ◽

Stationary Behavior

This paper focuses on forecasting the price of Bitcoin, motivated by its market growth and the recent interest of market participants and academics. We deploy six machine learning algorithms (e.g., Artificial Neural Network, Support Vector Machine, Random Forest, k-Nearest Neighbours, AdaBoost, Ridge regression), without deciding a priori which one is the ‘best’ model. The main contribution is to use these data analytics techniques with great caution in the parameterization, instead of classical parametric modelings (AR), to disentangle the non-stationary behavior of the data. As soon as Bitcoin is also used for diversification in portfolios, we need to investigate its interactions with stocks, bonds, foreign exchange, and commodities. We identify that other cryptocurrencies convey enough information to explain the daily variation of Bitcoin’s spot and futures prices. Forecasting results point to the segmentation of Bitcoin concerning alternative assets. Finally, trading strategies are implemented.

Download Full-text

Abstract 473: Identification of Apolipoproteins Using Feature Selection Technique

Arteriosclerosis Thrombosis and Vascular Biology ◽

10.1161/atvb.36.suppl_1.473 ◽

2016 ◽

Vol 36 (suppl_1) ◽

Author(s):

Hua Tang ◽

Hao Lin

Keyword(s):

Support Vector Machine ◽

Cross Validation ◽

Support Vector ◽

Feature Subset ◽

Risk Markers ◽

Dipeptide Composition ◽

Accurate Identification ◽

Feature Selection Technique ◽

Physiological Importance ◽

Fold Cross Validation

Objective: Apolipoproteins are of great physiological importance and are associated with different diseases such as dyslipidemia, thrombogenesis and angiocardiopathy. Apolipoproteins have therefore emerged as key risk markers and important research targets yet the types of apolipoproteins has not been fully elucidated. Accurate identification of the apoliproproteins is very crucial to the comprehension of cardiovascular diseases and drug design. The aim of this study is to develop a powerful model to precisely identify apolipoproteins. Approach and Results: We manually collected a non-redundant dataset of 53 apoliproproteins and 136 non-apoliproproteins with the sequence identify of less than 40% from UniProt. After formulating the protein sequence samples with g -gap dipeptide composition (here g =1~10), the analysis of various (ANOVA) was adopted to find out the best feature subset which can achieve the best accuracy. Support Vector Machine (SVM) was then used to perform classification. The predictive model was evaluated using a five-fold cross-validation which yielded a sensitivity of 96.2%, a specificity of 99.3%, and an accuracy of 98.4%. The study indicated that the proposed method could be a feasible means of conducting preliminary analyses of apoliproproteins. Conclusion: We demonstrated that apoliproproteins can be predicted from their primary sequences. Also we discovered the special dipeptide distribution in apoliproproteins. These findings open new perspectives to improve apoliproproteins prediction by considering the specific dipeptides. We expect that these findings will help to improve drug development in anti-angiocardiopathy disease. Key words: Apoliproproteins Angiocardiopathy Support Vector Machine

Download Full-text

Prediction of Carbohydrate-Binding Proteins from Sequences Using Support Vector Machines

Advances in Bioinformatics ◽

10.1155/2010/289301 ◽

2010 ◽

Vol 2010 ◽

pp. 1-9 ◽

Cited By ~ 8

Author(s):

Seizi Someya ◽

Masanori Kakuta ◽

Mizuki Morita ◽

Kazuya Sumikoshi ◽

Wei Cao ◽

...

Keyword(s):

Amino Acid ◽

Support Vector Machines ◽

Binding Proteins ◽

Prediction Method ◽

Amino Acid Sequences ◽

Support Vector ◽

Carbohydrate Binding ◽

Genome Database ◽

Vector Machines ◽

Carbohydrate Binding Proteins

Carbohydrate-binding proteins are proteins that can interact with sugar chains but do not modify them. They are involved in many physiological functions, and we have developed a method for predicting them from their amino acid sequences. Our method is based on support vector machines (SVMs). We first clarified the definition of carbohydrate-binding proteins and then constructed positive and negative datasets with which the SVMs were trained. By applying the leave-one-out test to these datasets, our method delivered 0.92 of the area under the receiver operating characteristic (ROC) curve. We also examined two amino acid grouping methods that enable effective learning of sequence patterns and evaluated the performance of these methods. When we applied our method in combination with the homology-based prediction method to the annotated human genome database, H-invDB, we found that the true positive rate of prediction was improved.

Download Full-text

Prediction of Rising Venues in Citation Networks

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2017.p0650 ◽

2017 ◽

Vol 21 (4) ◽

pp. 650-658 ◽

Cited By ~ 1

Author(s):

Muhammad Azam Zia ◽

◽

Zhongbao Zhang ◽

Guangda Li ◽

Haseeb Ahmad ◽

...

Keyword(s):

Machine Learning ◽

Continuous Improvement ◽

Citation Count ◽

Citation Network ◽

Machine Learning Algorithms ◽

Support Vector ◽

Citation Networks ◽

Network Support ◽

Core Issue ◽

Over Time

Prediction of rising stars has become a core issue in data mining and social networks. Prediction of rising venues could unveil rapidly emerging research venues in citation network. The aim of this research is to predict the rising venues. First, we presented five effective prediction features along with their mathematical formulations for extracting rising venues. The underlying features are composed by incorporating the citation count, publications, cited to and cited by information at venue level. For prediction purpose, we employ four machine learning algorithms including Bayesian Network, Support Vector Machine, Multilayer Perceptron and Random Forest. Experimental results demonstrate that proposed features set are effective for rising venues prediction. Our empirical analysis spotlights the rising venues that demonstrate the continuous improvement over time and finally become the leading scientific venues.

Download Full-text

Isolation and characterization of the pea cytochrome c oxidase Vb gene

Genome ◽

10.1139/g06-105 ◽

2006 ◽

Vol 49 (11) ◽

pp. 1481-1489 ◽

Cited By ~ 1

Author(s):

Nakao Kubo ◽

Shin-ichi Arimura ◽

Nobuhiro Tsutsumi ◽

Koh-ichi Kadowaki ◽

Masashi Hirai

Keyword(s):

Amino Acid ◽

Cytochrome C ◽

Cytochrome C Oxidase ◽

Fluorescent Proteins ◽

Amino Acid Sequences ◽

Homology Search ◽

Coding Region ◽

Angiosperm Evolution ◽

Isolation And Characterization ◽

Duplication Events

Three copies of the gene that encodes cytochrome c oxidase subunit Vb were isolated from the pea (PscoxVb-1, PscoxVb-2, and PscoxVb-3). Northern Blot and reverse transcriptase-PCR analyses suggest that all 3 genes are transcribed in the pea. Each pea coxVb gene has an N-terminal extended sequence that can encode a mitochondrial targeting signal, called a presequence. The localization of green fluorescent proteins fused with the presequence strongly suggests the targeting of pea COXVb proteins to mitochondria. Each pea coxVb gene has 5 intron sites within the coding region. These are similar to Arabidopsis and rice, although the intron lengths vary greatly. A phylogenetic analysis of coxVb suggests the occurrence of gene duplication events during angiosperm evolution. In particular, 2 duplication events might have occurred in legumes, grasses, and Solanaceae. A comparison of amino acid sequences in COXVb or its counterpart shows the conservation of several amino acids within a zinc finger motif. Interestingly, a homology search analysis showed that bacterial protein COG4391 and a mitochondrial complex I 13 kDa subunit also have similar amino acid compositions around this motif. Such similarity might reflect evolutionary relationships among the 3 proteins.

Download Full-text

The Influence of Dipeptide Composition on Protein Folding Rates

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.378-379.157 ◽

2011 ◽

Vol 378-379 ◽

pp. 157-160

Author(s):

Jian Xiu Guo ◽

Ni Ni Rao

Keyword(s):

Protein Folding ◽

Amino Acid ◽

Amino Acid Sequences ◽

Dipeptide Composition ◽

Coupling Effects ◽

Jackknife Test ◽

Folding Rates ◽

Important Challenge ◽

The Relationship

Understanding the relationship between amino acid sequences and folding rates of proteins is an important challenge in computational and molecular biology. All existing algorithms for predicting protein folding rates have never taken into account the sequence coupling effects. In this work, a novel algorithm was developed for predicting the protein folding rates from amino acid sequences. The prediction was achieved on the basis of dipeptide composition, in which the sequence coupling effects are explicitly included through a series of conditional probability elements. Based on a non-redundant dataset of 99 proteins, the proposed method was found to provide an excellent agreement between the predicted and experimental folding rates of proteins when evaluated with the jackknife test. The correlation coefficient was 87.7% and the standard error was 2.04, which indicated the important contribution from sequence coupling effects to the determination of protein folding rates.

Download Full-text

Incorporating Amino Acids Composition and Functional Domains for Identifying Bacterial Toxin Proteins

BioMed Research International ◽

10.1155/2014/972692 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 2

Author(s):

Min-Gang Su ◽

Chien-Hsun Huang ◽

Tzong-Yi Lee ◽

Yu-Ju Chen ◽

Hsin-Yi Wu

Keyword(s):

Amino Acids ◽

Cell Biology ◽

Predictive Performance ◽

Computational Prediction ◽

Amino Acid Sequences ◽

Bacterial Toxins ◽

Bacterial Toxin ◽

Support Vector ◽

Functional Domain ◽

Domain Information

Aside from pathogenesis, bacterial toxins also have been used for medical purpose such as drugs for cancer and immune diseases. Correctly identifying bacterial toxins and their types (endotoxins and exotoxins) has great impact on the cell biology study and therapy development. However, experimental methods for bacterial toxins identification are time-consuming and labor-intensive, implying an urgent need for computational prediction. Thus, we are motivated to develop a method for computational identification of bacterial toxins based on amino acid sequences and functional domain information. In this study, a nonredundant dataset of 167 bacterial toxins including 77 exotoxins and 90 endotoxins is adopted to learn the predictive model by using support vector machines (SVMs). The cross-validation evaluation shows that the SVM models trained with amino acids and dipeptides composition could yield an accuracy of 96.07% and 92.50%, respectively. For discriminating endotoxins from exotoxins, the SVM models trained with amino acids and dipeptides composition have achieved an accuracy of 95.71% and 92.86%, respectively. After incorporating functional domain information, the predictive performance is further improved. The proposed method has been demonstrated to be able to more effectively identify and classify bacterial toxins than the other two features on independent dataset, which may aid in bacterial biomedical development.

Download Full-text

Vehicle Price Prediction using SVM Techniques

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.g5915.069820 ◽

2020 ◽

Vol 9 (8) ◽

pp. 398-401

Keyword(s):

Machine Learning ◽

Research Area ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Data Set ◽

Network Support ◽

Java Application ◽

Learning Techniques ◽

The Individual

The prediction of price for a vehicle has been more popular in research area, and it needs predominant effort and information about the experts of this particular field. The number of different attributes is measured and also it has been considerable to predict the result in more reliable and accurate. To find the price of used vehicles a well defined model has been developed with the help of three machine learning techniques such as Artificial Neural Network, Support Vector Machine and Random Forest. These techniques were used not on the individual items but for the whole group of data items. This data group has been taken from some web portal and that same has been used for the prediction. The data must be collected using web scraper that was written in PHP programming language. Distinct machine learning algorithms of varying performances had been compared to get the best result of the given data set. The final prediction model was integrated into Java application

Download Full-text

Identification of Targeted Proteins by Jamu Formulas for Different Efficacies Using Machine Learning Approach

Life ◽

10.3390/life11080866 ◽

2021 ◽

Vol 11 (8) ◽

pp. 866

Author(s):

Sony Hartono Wijaya ◽

Farit Mochamad Afendi ◽

Irmanida Batubara ◽

Ming Huang ◽

Naoaki Ono ◽

...

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Random Forest ◽

Prediction Models ◽

Amino Acid Sequences ◽

Support Vector ◽

Supporting Evidence ◽

Molecular Fingerprints ◽

Target Proteins ◽

Drug Candidates

Background: We performed in silico prediction of the interactions between compounds of Jamu herbs and human proteins by utilizing data-intensive science and machine learning methods. Verifying the proteins that are targeted by compounds of natural herbs will be helpful to select natural herb-based drug candidates. Methods: Initially, data related to compounds, target proteins, and interactions between them were collected from open access databases. Compounds are represented by molecular fingerprints, whereas amino acid sequences are represented by numerical protein descriptors. Then, prediction models that predict the interactions between compounds and target proteins were constructed using support vector machine and random forest. Results: A random forest model constructed based on MACCS fingerprint and amino acid composition obtained the highest accuracy. We used the best model to predict target proteins for 94 important Jamu compounds and assessed the results by supporting evidence from published literature and other sources. There are 27 compounds that can be validated by professional doctors, and those compounds belong to seven efficacy groups. Conclusion: By comparing the efficacy of predicted compounds and the relations of the targeted proteins with diseases, we found that some compounds might be considered as drug candidates.

Download Full-text