Predictions of Apoptosis Proteins by Integrating Different Features Based on Improving Pseudo-Position-Specific Scoring Matrix

BioMed Research International ◽

10.1155/2020/4071508 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Xiaoli Ruan ◽

Dongming Zhou ◽

Rencan Nie ◽

Yanbu Guo

Keyword(s):

Prediction Accuracy ◽

Classification Model ◽

Position Specific Scoring Matrix ◽

Support Vector ◽

Data Imbalance ◽

Apoptosis Protein ◽

Scoring Matrix ◽

The Impact ◽

Apoptosis Proteins

Apoptosis proteins are strongly related to many diseases and play an indispensable role in maintaining the dynamic balance between cell death and division in vivo. Obtaining localization information on apoptosis proteins is necessary in understanding their function. To date, few researchers have focused on the problem of apoptosis data imbalance before classification, while this data imbalance is prone to misclassification. Therefore, in this work, we introduce a method to resolve this problem and to enhance prediction accuracy. Firstly, the features of the protein sequence are captured by combining Improving Pseudo-Position-Specific Scoring Matrix (IM-Psepssm) with the Bidirectional Correlation Coefficient (Bid-CC) algorithm from position-specific scoring matrix. Secondly, different features of fusion and resampling strategies are used to reduce the impact of imbalance on apoptosis protein datasets. Finally, the eigenvector adopts the Support Vector Machine (SVM) to the training classification model, and the prediction accuracy is evaluated by jackknife cross-validation tests. The experimental results indicate that, under the same feature vector, adopting resampling methods remarkably boosts many significant indicators in the unsampling method for predicting the localization of apoptosis proteins in the ZD98, ZW225, and CL317 databases. Additionally, we also present new user-friendly local software for readers to apply; the codes and software can be freely accessed at https://github.com/ruanxiaoli/Im-Psepssm.

Download Full-text

Reservoir Evaporation Prediction Modeling Based on Artificial Intelligence Methods

Water ◽

10.3390/w11061226 ◽

2019 ◽

Vol 11 (6) ◽

pp. 1226 ◽

Cited By ~ 2

Author(s):

Mohammed Falah Allawi ◽

Faridah Binti Othman ◽

Haitham Abdulmohsin Afan ◽

Ali Najah Ahmed ◽

Md. Shabbir Hossain ◽

...

Keyword(s):

Artificial Intelligence ◽

Prediction Accuracy ◽

Evaporation Rate ◽

Climatic Conditions ◽

Support Vector ◽

Time Increment ◽

Accuracy Level ◽

Increment Rate ◽

Input Variables ◽

The Impact

The current study explored the impact of climatic conditions on predicting evaporation from a reservoir. Several models have been developed for evaporation prediction under different scenarios, with artificial intelligence (AI) methods being the most popular. However, the existing models rely on several climatic parameters as inputs to achieve an acceptable accuracy level, some of which have been unavailable in certain case studies. In addition, the existing AI-based models for evaporation prediction have paid less attention to the influence of the time increment rate on the prediction accuracy level. This study investigated the ability of the radial basis function neural network (RBF-NN) and support vector regression (SVR) methods to develop an evaporation rate prediction model for a tropical area at the Layang Reservoir, Johor River, Malaysia. Two scenarios for input architecture were explored in order to examine the effectiveness of different input variable patterns on the model prediction accuracy. For the first scenario, the input architecture considered only the historical evaporation rate time series, while the mean temperature and evaporation rate were used as input variables for the second scenario. For both scenarios, three time-increment series (daily, weekly, and monthly) were considered.

Download Full-text

TOWARDS PREDICTING RICE LOSS DUE TO TYPHOONS IN THE PHILIPPINES

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-4-w19-63-2019 ◽

2019 ◽

Vol XLII-4/W19 ◽

pp. 63-70 ◽

Cited By ~ 1

Author(s):

S. Boeke ◽

M. J. C. van den Homberg ◽

A. Teklesadik ◽

J. L. D. Fabila ◽

D. Riquet ◽

...

Keyword(s):

Binary Classification ◽

Open Data ◽

Absolute Error ◽

The Philippines ◽

Classification Model ◽

Support Vector ◽

Average Value ◽

Early Action ◽

Support Vector Regressor ◽

The Impact

Abstract. Reliable predictions of the impact of natural hazards turning into a disaster is important for better targeting humanitarian response as well as for triggering early action. Open data and machine learning can be used to predict loss and damage to the houses and livelihoods of affected people. This research focuses on agricultural loss, more specifically rice loss in the Philippines due to typhoons. Regression and binary classification algorithms are trained using feature selection methods to find the most important explanatory features. Both geographical data from every province, and typhoon specific features of 11 historical typhoons are used as input. The percentage of lost rice area is considered as the output, with an average value of 7.1%. As for the regression task, the support vector regressor performed best with a Mean Absolute Error of 6.83 percentage points. For the classification model, thresholds of 20%, 30% and 40% are tested in order to find the best performing model. These thresholds represent different levels of lost rice fields for triggering anticipatory action towards farmers. The binary classifiers are trained to increase its ability to rightly predict the positive samples. In all three cases, the support vector classifier performed the best with a recall score of 88%, 75% and 81.82%, respectively. However, the precision score for each of these models was low: 17.05%, 14.46% and 10.84%, respectively. For both the support vector regressor and classifier, of all 14 available input features, only wind speed was selected as explanatory feature. Yet, for the other algorithms that were trained in this study, other sets of features were selected depending also on the hyperparameter settings. This variation in selected feature sets as well as the imprecise predictions were consequences of the small dataset that was used for this study. It is therefore important that data for more typhoons as well as data on other explanatory variables are gathered in order to make more robust and accurate predictions. Also, if loss data becomes available on municipality-level, rather than province-level, the models will become more accurate and valuable for operationalization.

Download Full-text

Prediction Modeling of Household’s Preparedness of Natural Hazards Mitigation

10.20944/preprints202110.0360.v2 ◽

2021 ◽

Author(s):

Chen Xia ◽

Yuqing Hu

Keyword(s):

Disaster Preparedness ◽

Critical Factor ◽

Personal Characteristics ◽

Household Survey ◽

Classification Model ◽

Support Vector ◽

Multi Layer Perceptron ◽

Household Preparedness ◽

Distribution Studies ◽

The Impact

Natural disasters are showing an increase in the magnitude, frequency, and geographic distribution. Studies have shown that individuals’ self-sufficiency, which largely depends on household preparedness, is very important for hazard mitigation in at least the first 72 hours following a disaster. However, for factors that influence a household’s disaster preparedness, though there are many studies trying to identify from different aspects, we still lack an integrative analysis on how these factors contribute to a household’s preparation. This paper aims to build a classification model to predict whether a household has prepared for a potential disaster based on their personal characteristics and the environment they located. We collect data from the Federal Emergency Management Agency’s National Household Survey in 2018 and train four classification models - logistic regression, decision trees, support vector machines, and multi-layer perceptron classifier models- to predict the impact of personal characteristics and the environment they located on household prepare for the potential natural disaster. Results show that the multi-layer perceptron classifier model outperforms others with the highest scoring on both recall (0.8531) and f1 measure (0.7386). In addition, feature selection results also show that among other factors, a household’s accessibility to disaster-related information is the most critical factor that impacts household disaster preparation. Though there is still room for further parameter optimization, the model gives a clue that we could support disaster management by gathering publicly accessible data.

Download Full-text

Predicting Apoptosis Protein Subcellular Locations based on the Protein Overlapping Property Matrix and Tri-Gram Encoding

International Journal of Molecular Sciences ◽

10.3390/ijms20092344 ◽

2019 ◽

Vol 20 (9) ◽

pp. 2344

Author(s):

Yang Yang ◽

Huiwen Zheng ◽

Chunhua Wang ◽

Wanyue Xiao ◽

Taigang Liu

Keyword(s):

Support Vector Machine ◽

Subcellular Location ◽

Recursive Feature Elimination ◽

Support Vector ◽

Svm Classifier ◽

Protein Subcellular Location ◽

Promising Tool ◽

Apoptosis Protein ◽

Benchmark Datasets ◽

Apoptosis Proteins

To reveal the working pattern of programmed cell death, knowledge of the subcellular location of apoptosis proteins is essential. Besides the costly and time-consuming method of experimental determination, research into computational locating schemes, focusing mainly on the innovation of representation techniques on protein sequences and the selection of classification algorithms, has become popular in recent decades. In this study, a novel tri-gram encoding model is proposed, which is based on using the protein overlapping property matrix (POPM) for predicting apoptosis protein subcellular location. Next, a 1000-dimensional feature vector is built to represent a protein. Finally, with the help of support vector machine-recursive feature elimination (SVM-RFE), we select the optimal features and put them into a support vector machine (SVM) classifier for predictions. The results of jackknife tests on two benchmark datasets demonstrate that our proposed method can achieve satisfactory prediction performance level with less computing capacity required and could work as a promising tool to predict the subcellular locations of apoptosis proteins.

Download Full-text

Short-Term Wind Speed Forecasting Based on Optimizated Support Vector Machine

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.300-301.189 ◽

2013 ◽

Vol 300-301 ◽

pp. 189-194 ◽

Cited By ~ 1

Author(s):

Yu Sun ◽

Ling Ling Li ◽

Xiao Song Huang ◽

Chao Ying Duan

Keyword(s):

Support Vector Machine ◽

Wind Speed ◽

Prediction Model ◽

Prediction Accuracy ◽

Support Vector ◽

Particle Swarm Algorithm ◽

Short Term ◽

Neural Network Prediction ◽

Random Fluctuations ◽

The Impact

To avoid the impact which is caused by the characteristics of the random fluctuations of the wind speed to grid-connected wind power generation system, accurately prediction of short-term wind speed is needed. This paper designed a combination prediction model which used the theories of wavelet transformation and support vector machine (SVM). This improved the model’s prediction accuracy through the method of achiving change character in wind speed sequences in different scales by wavelet transform and optimizing the parameters of support vector machines through the improved particle swarm algorithm. The model showed great generalization ability and high prediction accuracy through the experiment. The lowest root-mean-square error of 200 samples was up to 0.0932 and the model’s effect was much stronger than the BP neural network prediction model. It provided an effective method for predicting wind speed.

Download Full-text

IDENTIFIKASI KEBUTUHAN DASAR DI TEMPAT EVAKUASI SEMENTARA PASCA ERUPSI MERAPI DENGAN SENTIMENT ANALISIS DAN SUPPORT VECTOR MACHINE

Telematika ◽

10.31315/telematika.v15i1.3068 ◽

2018 ◽

Vol 15 (1) ◽

pp. 77

Author(s):

Resky Rayvano Moningka ◽

Djoko Budiyanto Setyohadi ◽

Khaerunnisa Khaerunnisa ◽

Pranowo Pranowo

Keyword(s):

Support Vector Machine ◽

Public Opinion ◽

Maximum Entropy ◽

Disaster Management ◽

Cross Validation ◽

Basic Needs ◽

Classification Model ◽

Support Vector ◽

Twitter Data ◽

The Impact

AbstractMount Merapi Eruption in 2010 was the biggest after 1872. The impact of this eruption was felt by people who lived around the areas which were affected by this Merapi Eruption. Thus, disaster management was done. One of the disaster management was the fulfillment of basic needs. This research aims to collect public opinion against the fulfillment of basic needs in the shelters after Merapi Eruption based on Twitter data. The algorithm which is used in this research is Support Vector Machine to develop classification model over the data that has been collected. The expected result from this study is to know the basic needs in a shelter. The accuracy gained by performing Cross Validation for 10 folds from Support Vector Machine is 87.96% and Maximum Entropy is 87.45%. Keywords: twitter, sentiment analisis, merapi eruption, support vector machine AbstrakErupsi Gunung Merapi 2010 merupakan yang terbesar setelah tahun 1872. Dampak dari Erupsi Gunung Merapi dirasakan oleh masyarakat yang tinggal di daerah terdampak Erupsi Merapi. Oleh sebab itu dilakukan penanggulangan Bencana. salah satu penanggulangan bencana adalah pemenuhan kebutuhan dasar. Penelitian ini bertujuan untuk mengumpulkan opini publik terhadap pemenuhan kebutuhan dasar di tempat pengungsian pasca erupsi merapi berdasarkan data Twitter. Algoritma yang digunakan dalam penelitian ini adalah Support Vector Machine untuk membangun model klasifikasi atas data yang sudah dikumpulkan. Hasil yang diharapkan dari penelitian ini adalah mengetahui kebutuhan dasar dari suatu tempat pengungsian. Akurasi yang didapatkan dengan melakukan Cross Validation sebanyak 10 fold dari model klasifikasi Support Vector Machine87,96% dan Maximum Entropy 87,45 Kata Kunci: twitter, analisis sentimen, erupsi merapi, support vector machine

Download Full-text

LipoSVM: Prediction of Lysine lipoylation in Proteins based on the Support Vector Machine

Current Genomics ◽

10.2174/1389202919666191014092843 ◽

2019 ◽

Vol 20 (5) ◽

pp. 362-370 ◽

Cited By ~ 1

Author(s):

Meiqi Wu ◽

Pengchao Lu ◽

Yingxi Yang ◽

Liwen Liu ◽

Hui Wang ◽

...

Keyword(s):

Support Vector Machine ◽

Sampling Technique ◽

Experimental Methods ◽

Position Specific Scoring Matrix ◽

Support Vector ◽

Post Translational Modification ◽

Independent Test ◽

Scoring Matrix ◽

Sample Ratio ◽

Fold Cross Validation

Background: Lysine lipoylation which is a rare and highly conserved post-translational modification of proteins has been considered as one of the most important processes in the biological field. To obtain a comprehensive understanding of regulatory mechanism of lysine lipoylation, the key is to identify lysine lipoylated sites. The experimental methods are expensive and laborious. Due to the high cost and complexity of experimental methods, it is urgent to develop computational ways to predict lipoylation sites. Methodology: In this work, a predictor named LipoSVM is developed to accurately predict lipoylation sites. To overcome the problem of an unbalanced sample, synthetic minority over-sampling technique (SMOTE) is utilized to balance negative and positive samples. Furthermore, different ratios of positive and negative samples are chosen as training sets. Results: By comparing five different encoding schemes and five classification algorithms, LipoSVM is constructed finally by using a training set with positive and negative sample ratio of 1:1, combining with position-specific scoring matrix and support vector machine. The best performance achieves an accuracy of 99.98% and AUC 0.9996 in 10-fold cross-validation. The AUC of independent test set reaches 0.9997, which demonstrates the robustness of LipoSVM. The analysis between lysine lipoylation and non-lipoylation fragments shows significant statistical differences. Conclusion: A good predictor for lysine lipoylation is built based on position-specific scoring matrix and support vector machine. Meanwhile, an online webserver LipoSVM can be freely downloaded from https://github.com/stars20180811/LipoSVM.

Download Full-text

Statistical Analysis for Twitter Spam Detection

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset1962170 ◽

2019 ◽

pp. 624-629

Author(s):

Ganesh Udge ◽

Mahesh Mohite ◽

Shubhankar Bendre ◽

Yogeshwar Birnagal ◽

Disha Wankhede

Keyword(s):

Online Social Networks ◽

Binary Classification ◽

Classification Model ◽

Training Dataset ◽

Support Vector ◽

Spam Detection ◽

Data Sampling ◽

Quality Of Data ◽

Related Factors ◽

The Impact

The spreading and learning of new discoveries and information is made available using current online social networks. In Recent days, the solutions may be irrelevant to the actual content; also termed as attacks in the layman’s term such attacks are been performed on Twitter as well and called as Twitter spammers. The quality of data is being compromised by addition of malicious and harmful information using URL, bio, emoticons, audio, images/videos & hash-tags through different accounts by exchanging tweets, personal messages (Direct Message’s) & re-tweets. Misleading sites may be linked with the malicious links which may affect adverse effects on the user and also interfere in their decision making processes. To improve user-experience from the spammers attacks, the training twitter dataset are applied and then by extracting and using the 12 lightweight features like user’s age, number of followers, count of tweets and re-tweets, etc. are used to distinguish the spam from non-spam. For enhancing the performance, the discretization of the function is important for transmission of spam detection between tweets. Our system creates classification model for Spam detection which includes binary classification and automatic learning algorithms viz. Naïve Bayes classifier or Support Vector Machine classifier which understands the behaviour of the model. The system will categorize the tweets from datasets into Spam and Non-spam classes and provide the user’s feed with only the relevant information. The system will report the impact of data-related factors such as relationship between spam and non-spam tweets, size of training dataset, data sampling and detection performance. The proposed system’s function is detection and analysis of the simple and variable twitter spam over time. The spam detection is a major challenge for the system and shortens the gap between performance appraisals and focuses primarily on data, features and patterns to identify real user and informing it about the spam tweets along with the performance statistics. The work is to detect spammed tweets in real time, since the new tweets may show patterns and this will help for training and updating dataset and in knowledge base.

Download Full-text

Emergence of a node-like population within an in vitro derived Neural Mesodermal Progenitors (NMPs) population

10.1101/326371 ◽

2018 ◽

Cited By ~ 1

Author(s):

Shlomit Edri ◽

Penelope Hayward ◽

Wajid Jawaid ◽

Alfonso Martinez Arias

Keyword(s):

Stem Cells ◽

Single Cell Analysis ◽

Embryonic Stem ◽

Paraxial Mesoderm ◽

Classification Model ◽

Support Vector ◽

Epiblast Stem Cells ◽

Sequence Of Events

AbstractThe mammalian embryos Caudal Lateral Epiblast (CLE) harbours bipotent progenitors, called Neural Mesodermal Progenitors (NMPs), that contribute to the spinal cord and the paraxial mesoderm throughout axial elongation. Here we performed a single cell analysis of different in vitro NMPs populations produced either from embryonic stem cells (ESCs) or epiblast stem cells (EpiSCs) and compared them to E8.25 CLE mouse embryos. In our analysis of this region our findings challenge the notion that NMPs should coexpress Sox2 and T. We built a Support Vector Machine (SVM) based on the embryo CLE and use it as a classification model to analyse the in vitro NMP-like populations. We showed that ESCs derived NMPs are heterogeneous and contain few NMP-like cells, whereas EpiSCs derived NMPs, produce a high proportion of cells with the embryo NMP signature. Importantly, we found that the population from which the Epi-NMPs are derived in culture, contains a nodelike population, which is responsible for maintaining the expression of T in vitro. These results mimic the events in vivo and suggest a sequence of events for the NMPs emergence.

Download Full-text

Reliable Identification of Oolong Tea Species: Nondestructive Testing Classification Based on Fluorescence Hyperspectral Technology and Machine Learning

Agriculture ◽

10.3390/agriculture11111106 ◽

2021 ◽

Vol 11 (11) ◽

pp. 1106

Author(s):

Yan Hu ◽

Lijia Xu ◽

Peng Huang ◽

Xiong Luo ◽

Peng Wang ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Principal Component ◽

Classification Model ◽

Recursive Feature Elimination ◽

Support Vector ◽

K Nearest Neighbor ◽

Oolong Tea ◽

The Impact ◽

T Distribution

A rapid and nondestructive tea classification method is of great significance in today’s research. This study uses fluorescence hyperspectral technology and machine learning to distinguish Oolong tea by analyzing the spectral features of tea in the wavelength ranging from 475 to 1100 nm. The spectral data are preprocessed by multivariate scattering correction (MSC) and standard normal variable (SNV), which can effectively reduce the impact of baseline drift and tilt. Then principal component analysis (PCA) and t-distribution random neighborhood embedding (t-SNE) are adopted for feature dimensionality reduction and visual display. Random Forest-Recursive Feature Elimination (RF-RFE) is used for feature selection. Decision Tree (DT), Random Forest Classification (RFC), K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) are used to establish the classification model. The results show that MSC-RF-RFE-SVM is the best model for the classification of Oolong tea in which the accuracy of the training set and test set is 100% and 98.73%, respectively. It can be concluded that fluorescence hyperspectral technology and machine learning are feasible to classify Oolong tea.

Download Full-text