scholarly journals Support Vector Machine Based Classification Model for Screening Plasmodium falciparum Proliferation Inhibitors and Non-Inhibitors

2011 ◽  
Vol 3 ◽  
pp. BECB.S7503 ◽  
Author(s):  
Sangeetha Subramaniam ◽  
Monica Mehrotra ◽  
Dinesh Gupta

There is an urgent need to develop novel anti-malarials in view of the increasing disease burden and growing resistance of the currently used drugs against the malarial parasites. Proliferation inhibitors targeting P. falciparum intraerythrocytic cycle are one of the important classes of compounds being explored for its potential to be novel antimalarials. Support Vector Machine (SVM) based model developed by us can facilitate rapid screening of large and diverse chemical libraries by reducing false hits and prioritising compounds before setting up expensive High Throughput Screening experiment. The SVM model, trained with molecular descriptors of proliferation inhibitors and non-inhibitors, displayed a satisfactory performance on cross validations and independent data set, with an average accuracy of 83% and AUC of 0.88. Intriguingly, the method displayed remarkable accuracy for the recently submitted P. falciparum whole cell screening datasets. The method also predicted several inhibitors in the National Cancer Institute diversity set, mostly similar to the known inhibitors.

Molecules ◽  
2020 ◽  
Vol 25 (6) ◽  
pp. 1442 ◽  
Author(s):  
Tao Shen ◽  
Hong Yu ◽  
Yuan-Zhong Wang

Gentiana, which is one of the largest genera of Gentianoideae, most of which had potential pharmaceutical value, and applied to local traditional medical treatment. Because of the phytochemical diversity and difference of bioactive compounds among species, which makes it crucial to accurately identify authentic Gentiana species. In this paper, the feasibility of using the infrared spectroscopy technique combined with chemometrics analysis to identify Gentiana and its related species was studied. A total of 180 batches of raw spectral fingerprints were obtained from 18 species of Gentiana and Tripterospermum by near-infrared (NIR: 10,000–4000 cm−1) and Fourier transform mid-infrared (MIR: 4000–600 cm−1) spectrum. Firstly, principal component analysis (PCA) was utilized to explore the natural grouping of the 180 samples. Secondly, random forests (RF), support vector machine (SVM), and K-nearest neighbors (KNN) models were built while using full spectra (including 1487 NIR variables and 1214 FT-MIR variables, respectively). The MIR-SVM model had a higher classification accuracy rate than the other models that were based on the results of the calibration sets and prediction sets. The five feature selection strategies, VIP (variable importance in the projection), Boruta, GARF (genetic algorithm combined with random forest), GASVM (genetic algorithm combined with support vector machine), and Venn diagram calculation, were used to reduce the dimensions of the data variable in order to further reduce numbers of variables for modeling. Finally, 101 NIR and 73 FT-MIR bands were selected as the feature variables, respectively. Thirdly, stacking models were built based on the optimal spectral dataset. Most of the stacking models performed better than the full spectra-based models. RF and SVM (as base learners), combined with the SVM meta-classifier, was the optimal stacked generalization strategy. For the SG-Ven-MIR-SVM model, the accuracy (ACC) of the calibration set and validation set were both 100%. Sensitivity (SE), specificity (SP), efficiency (EFF), Matthews correlation coefficient (MCC), and Cohen’s kappa coefficient (K) were all 1, which showed that the model had the optimal authenticity identification performance. Those parameters indicated that stacked generalization combined with feature selection is probably an important technique for improving the classification model predictive accuracy and avoid overfitting. The study result can provide a valuable reference for the safety and effectiveness of the clinical application of medicinal Gentiana.


Author(s):  
Jie Xu ◽  
Xianglong Liu ◽  
Zhouyuan Huo ◽  
Cheng Deng ◽  
Feiping Nie ◽  
...  

Support Vector Machine (SVM) is originally proposed as a binary classification model, and it has already achieved great success in different applications. In reality, it is more often to solve a problem which has more than two classes. So, it is natural to extend SVM to a multi-class classifier. There have been many works proposed to construct a multi-class classifier based on binary SVM, such as one versus all strategy, one versus one strategy and Weston's multi-class SVM. One versus all strategy and one versus one strategy split the multi-class problem to multiple binary classification subproblems, and we need to train multiple binary classifiers. Weston's multi-class SVM is formed by ensuring risk constraints and imposing a specific regularization, like Frobenius norm. It is not derived by maximizing the margin between hyperplane and training data which is the motivation in SVM. In this paper, we propose a multi-class SVM model from the perspective of maximizing margin between training points and hyperplane, and analyze the relation between our model and other related methods. In the experiment, it shows that our model can get better or compared results when comparing with other related methods.


2021 ◽  
Vol 17 (2) ◽  
pp. 183-189
Author(s):  
Heba Salim ◽  
Musaab Alaziz ◽  
Turki Abdalla

In this paper, a new method is proposed for people tracking using the human skeleton provided by the Kinect sensor, Our method is based on skeleton data, which includes the coordinate value of each joint in the human body. For data classification, the Support Vector Machine (SVM) and Random Forest techniques are used. To achieve this goal, 14 classes of movements are defined, using the Kinect Sensor to extract data containing 46 features and then using them to train the classification models. The system was tested on 12 subjects, each of whom performed 14 movements in each experiment. Experiment results show that the best average accuracy is 90.2 % for the SVM model and 99 % for the Random forest model. From the experiments, we concluded that the best distance between the Kinect sensor and the human body is one meter.


2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Yun-xiao Lou ◽  
Xian-shu Fu ◽  
Xiao-ping Yu ◽  
Zi-hong Ye ◽  
Hai-feng Cui ◽  
...  

This paper focused on an effective method to discriminate the geographical origin of Wuyi-Rock tea by the stable isotope ratio (SIR) and metallic element profiling (MEP) combined with support vector machine (SVM) analysis. Wuyi-Rock tea (n=99) collected from nine producing areas and non-Wuyi-Rock tea (n=33) from eleven nonproducing areas were analysed for SIR and MEP by established methods. The SVM model based on coupled data produced the best prediction accuracy (0.9773). This prediction shows that instrumental methods combined with a classification model can provide an effective and stable tool for provenance discrimination. Moreover, every feature variable in stable isotope and metallic element data was ranked by its contribution to the model. The results show that δ2H, δ18O, Cs, Cu, Ca, and Rb contents are significant indications for provenance discrimination and not all of the metallic elements improve the prediction accuracy of the SVM model.


SLEEP ◽  
2020 ◽  
Vol 43 (7) ◽  
Author(s):  
Wen-Chi Huang ◽  
Pei-Lin Lee ◽  
Yu-Ting Liu ◽  
Ambrose A Chiang ◽  
Feipei Lai

Abstract Study Objectives Polysomnography is the gold standard for diagnosis of obstructive sleep apnea (OSA) but it is costly and access is often limited. The aim of this study is to develop a clinically useful support vector machine (SVM)-based prediction model to identify patients with high probability of OSA for nonsleep specialist physician in clinical practice. Methods The SVM model was developed using the features routinely collected at the clinical evaluation from 6,875 Chinese patients referred to sleep clinics for suspected OSA. Three apnea-hypopnea index (AHI) cutoffs, ≥5/h, ≥15/h, and ≥30/h were used to define the severity of OSA. The continuous and categorized features were selected separately and were further selected through stepwise forward feature selection. The modeling was achieved through fivefold cross-validation. The model discriminative ability was evaluated for the whole data set and four subgroups categorized with gender and age (<65 versus ≥65 years old [y/o]). Results Two features were selected to predict AHI cutoff ≥5/h with six features selected for ≥15/h, and six features selected for ≥30/h, respectively, to reach Area under the Receiver Operating Characteristic (AUROC) 0.82, 0.80, and 0.78, respectively. The sensitivity was 74.14%, 75.18%, and 70.26%, while the specificity was 74.71%, 68.73%, and 70.30%, respectively. Compared to logistic regression, Berlin questionnaire, NoSAS Score, and Supersparse Linear Integer Model (SLIM) scoring system, the SVM model performs better with a more balanced sensitivity and specificity. The discriminative ability was best for male <65 y/o and modest for female ≥65 y/o. Conclusion Our model provides a simple and accurate modality for early identification of patients with OSA and may potentially help prioritize them for sleep study.


2014 ◽  
Vol 11 (90) ◽  
pp. 20130860 ◽  
Author(s):  
Véronique Hamon ◽  
Raphael Bourgeas ◽  
Pierre Ducrot ◽  
Isabelle Theret ◽  
Laura Xuereb ◽  
...  

Over the last 10 years, protein–protein interactions (PPIs) have shown increasing potential as new therapeutic targets. As a consequence, PPIs are today the most screened target class in high-throughput screening (HTS). The development of broad chemical libraries dedicated to these particular targets is essential; however, the chemical space associated with this ‘high-hanging fruit’ is still under debate. Here, we analyse the properties of 40 non-redundant small molecules present in the 2P2I database ( http://2p2idb.cnrs-mrs.fr/ ) to define a general profile of orthosteric inhibitors and propose an original protocol to filter general screening libraries using a support vector machine (SVM) with 11 standard D ragon molecular descriptors. The filtering protocol has been validated using external datasets from PubChem BioAssay and results from in-house screening campaigns . This external blind validation demonstrated the ability of the SVM model to reduce the size of the filtered chemical library by eliminating up to 96% of the compounds as well as enhancing the proportion of active compounds by up to a factor of 8. We believe that the resulting chemical space identified in this paper will provide the scientific community with a concrete support to search for PPI inhibitors during HTS campaigns.


Author(s):  
Prabira Kumar Sethy ◽  
Santi Kumari Behera ◽  
Pradyumna Kumar Ratha ◽  
Preesat Biswas

The detection of coronavirus (COVID-19) is now a critical task for the medical practitioner. The coronavirus spread so quickly between people and approaches 100,000 people worldwide. In this consequence, it is very much essential to identify the infected people so that prevention of spread can be taken. In this paper, the deep feature plus support vector machine (SVM) based methodology is suggested for detection of coronavirus infected patient using X-ray images. For classification, SVM is used instead of deep learning based classifier, as the later one need a large dataset for training and validation. The deep features from the fully connected layer of CNN model are extracted and fed to SVM for classification purpose. The SVM classifies the corona affected X-ray images from others. The methodology consists of three categories of Xray images, i.e., COVID-19, pneumonia and normal. The method is beneficial for the medical practitioner to classify among the COVID-19 patient, pneumonia patient and healthy people. SVM is evaluated for detection of COVID-19 using the deep features of different 13 number of CNN models. The SVM produced the best results using the deep feature of ResNet50. The classification model, i.e. ResNet50 plus SVM achieved accuracy, sensitivity, FPR and F1 score of 95.33%,95.33%,2.33% and 95.34% respectively for detection of COVID-19 (ignoring SARS, MERS and ARDS). Again, the highest accuracy achieved by ResNet50 plus SVM is 98.66%. The result is based on the Xray images available in the repository of GitHub and Kaggle. As the data set is in hundreds, the classification based on SVM is more robust compared to the transfer learning approach. Also, a comparison analysis of other traditional classification method is carried out. The traditional methods are local binary patterns (LBP) plus SVM, histogram of oriented gradients (HOG) plus SVM and Gray Level Co-occurrence Matrix (GLCM) plus SVM. In traditional image classification method, LBP plus SVM achieved 93.4% of accuracy.


2019 ◽  
Vol 8 (2) ◽  
pp. 3861-3870

Autistic Spectrum Disorder (ASD) is a brain developmental disorder which weakens the ability to communicate and interact with others. A child with autism spectrum disorder may have different, repetitive patterns of behaviour, interests or activities, including some specific signs. To diagnose the behaviour of ASD and identify the level of disease on the human is still a challenging task for the doctors. Only by the trained and experienced physician can identify the ASD immediately. The data set for autism problem consist of number of causes and the results based on the symptoms for ASD. So, Data mining algorithm is in need to organize and pattern the ASD details. The machine algorithms are available to classify the data in data mining works. In this proposed work, a machine learning algorithm called Support Vector Machine is used to classify the ASD children accurately. SVM is one of the classification algorithms which finding the hyper plane that maximizes the margin between the two classes. Though SVM give better identification of disease, some children have their unique nature which hides their problem of ASD easily. So, to diagnose the problem accurately, the user defined SVM parameters are tuned by optimization algorithm called Differential Evolutionary Algorithm. DE is an optimization algorithm used to find the optimal solution of SVM parameters. Further, to improve the performance of the proposed method, the dimension reduction technique is followed to reduce the SVM and ANN network dimension. The Sequential Feature Selection (SFS) method is applied in this paper, which select the most influenced variables for the output. The reduced network is further classified by ANN and SVM model. The Data set for the ANN and SVM network has been taken from the real records of the multi-specialty hospitals. The SVM and DE optimized SVM results are compared with another classification model called Artificial Neural Networks. The test results show the betterment of DE optimized SVM which give the classification of ASD child very accurately compare with ANN and DE optimized ANN.


2013 ◽  
Vol 411-414 ◽  
pp. 2373-2376
Author(s):  
Lu Pan ◽  
Sheng Ji Rong ◽  
Chang Hui Yu ◽  
Chun Xia Jin ◽  
Quan Yin Zhu

In order to obtain suit commodity price forecasting model and help consumers have the better reference resources when they buy mobile phones, cell phones price forecasting on training step is discussed in this paper. One year price for ten types mobile phone which extracted from http://www.jd.com/ is used as the original data to improve Support Vector Machine (SVM) model based on the training step. According to this forecasting method, the experiments are implemented under the different training step for different types cell phones depend on the accuracy rata. Comparing the experimental results with the original data, the forecasting average accuracy obtains 94.48 percent. But with the training step growth, the efficiency of model is cutting down unceasingly. Experiment results prove that the research is meaningful and useful and it is not only for consumers, but also for businesses in the cell phones market.


Author(s):  
Zhenhua Li ◽  
Junjie Cheng ◽  
A. Abu-Siada

Background: Winding deformation is one of the most common faults that an operating power transformer experiences over its operational life. Thus it is essential to detect and rectify such faults at early stages to avoid potential catastrophic consequences to the transformer. At present, methods published in the literature for transformer winding fault diagnosis are mainly focused on identifying fault type and quantifying its extent without giving much attention to the identification of fault location. Methods: This paper presents a method based on a genetic algorithm and support vector machine (GA-SVM) to improve the faults’ classification of power transformers in terms of type and location. In this regard, a sinusoidal sweep signal in the frequency range of 600 kHz to 1MHz is applied to one terminal of the transformer winding. A mathematical index of the induced current at the head and end of the transformer winding under various fault conditions is used to extract unique features that are fed to a support vector machine (SVM) model for training. Parameters of the SVM model are optimized using a genetic algorithm (GA). Results : The effectiveness of mathematical indicators to extract fault type characteristics and the proposed fault classification model for fault diagnosis is demonstrated through extensive simulation analysis for various transformer winding faults at different locations. Conclusion : The proposed model can effectively identify different fault types and determine their location within the transformer winding, and the diagnostic rate of the fault type and fault location are 100% and 90%, respectively.


Sign in / Sign up

Export Citation Format

Share Document