scholarly journals Generalization of Parameter Selection of SVM and LS-SVM for Regression

2019 ◽  
Vol 1 (2) ◽  
pp. 745-755 ◽  
Author(s):  
Zeng ◽  
Tan ◽  
Matsunaga ◽  
Shirai

A Support Vector Machine (SVM) for regression is a popular machine learning model that aims to solve nonlinear function approximation problems wherein explicit model equations are difficult to formulate. The performance of an SVM depends largely on the selection of its parameters. Choosing between an SVM that solves an optimization problem with inequality constrains and one that solves the least square of errors (LS-SVM) adds to the complexity. Various methods have been proposed for tuning parameters, but no article puts the SVM and LS-SVM side by side to discuss the issue using a large dataset from the real world, which could be problematic for existing parameter tuning methods. We investigated both the SVM and LS-SVM with an artificial dataset and a dataset of more than 200,000 points used for the reconstruction of the global surface ocean CO2 concentration. The results reveal that: (1) the two models are most sensitive to the parameter of the kernel function, which lies in a narrow range for scaled input data; (2) the optimal values of other parameters do not change much for different datasets; and (3) the LS-SVM performs better than the SVM in general. The LS-SVM is recommended, as it has less parameters to be tuned and yields a smaller bias. Nevertheless, the SVM has advantages of consuming less computer resources and taking less time to train. The results suggest initial parameter guesses for using the models.

2021 ◽  
Vol 11 (20) ◽  
pp. 9487
Author(s):  
Mohammed Al-Sarem ◽  
Faisal Saeed ◽  
Zeyad Ghaleb Al-Mekhlafi ◽  
Badiea Abdulkarem Mohammed ◽  
Mohammed Hadwan ◽  
...  

The widespread usage of social media has led to the increasing popularity of online advertisements, which have been accompanied by a disturbing spread of clickbait headlines. Clickbait dissatisfies users because the article content does not match their expectation. Detecting clickbait posts in online social networks is an important task to fight this issue. Clickbait posts use phrases that are mainly posted to attract a user’s attention in order to click onto a specific fake link/website. That means clickbait headlines utilize misleading titles, which could carry hidden important information from the target website. It is very difficult to recognize these clickbait headlines manually. Therefore, there is a need for an intelligent method to detect clickbait and fake advertisements on social networks. Several machine learning methods have been applied for this detection purpose. However, the obtained performance (accuracy) only reached 87% and still needs to be improved. In addition, most of the existing studies were conducted on English headlines and contents. Few studies focused specifically on detecting clickbait headlines in Arabic. Therefore, this study constructed the first Arabic clickbait headline news dataset and presents an improved multiple feature-based approach for detecting clickbait news on social networks in Arabic language. The proposed approach includes three main phases: data collection, data preparation, and machine learning model training and testing phases. The collected dataset included 54,893 Arabic news items from Twitter (after pre-processing). Among these news items, 23,981 were clickbait news (43.69%) and 30,912 were legitimate news (56.31%). This dataset was pre-processed and then the most important features were selected using the ANOVA F-test. Several machine learning (ML) methods were then applied with hyper-parameter tuning methods to ensure finding the optimal settings. Finally, the ML models were evaluated, and the overall performance is reported in this paper. The experimental results show that the Support Vector Machine (SVM) with the top 10% of ANOVA F-test features (user-based features (UFs) and content-based features (CFs)) obtained the best performance and achieved 92.16% of detection accuracy.


2014 ◽  
Vol 687-691 ◽  
pp. 1408-1411
Author(s):  
Ping An Wang ◽  
Xu Sheng Gan ◽  
Wen Ming Gao

The model capability of Support Vector Machine (SVM) relies on the selection of kernel function. To obtain a better application modeling of SVM, the wavelet kernel function that satisfies Merce condition is introduced to use the kernel function of SVM, achieving a good effect. In the paper, on the basis of wavelet kernel function, a wavelet derivation kernel function is proposed in the application of SVM for higher accuracy. An actual example on nonlinear function approximation shows that SVM regression model has a satisfactory approximation effect, and also support an effective nonlinear modeling method.


2020 ◽  
Vol 8 (6) ◽  
pp. 2277-2283

Big data has revolutionized every field of life, which accumulates human learning as well. The field of education has progressed in past couple of decades, and addition to that, rapid growth in the number of educational institutions has created a tough competition. The massive accumulation of data in the educational sector has created a great scope of EDM (Educational Data Mining) with the support of robust predictive models. It is quite necessary to regularly examine the performance of the students to make them perform better, thus helps to maintain the reputation of the institution. This study proposed a predictive model through which the performance of the student can be forecasted depending upon various characteristics. The KDD(Knowledge Discovery in Databases) methodology was followed stepwise in this study for developing predictive models to predict student performance. The data balancing techniques such as SMOTE (Synthetic Minority Over-sampling Technique) and ADASYN (Adaptive Synthetic Sampling) were employed to handle the unbalanced effect of data which causes bias predictions. Also, for the selection of significant features techniques, FCBF (Fast Correlation Based Feature selection) and RFE (Recursive Feature Elimination) were used. The EDM algorithms Random Forest (RF), Support Vector Machine (SVM) and Artificial Neural Network (ANN) were utilized for predicting student performance with suitable hyper-parameter tuning using random search to enhance the performance of the model. The results obtained were cross-validated using Ensemble Method and benchmarked with previous studies. The random forest model achieved the highest accuracy of 86% after data balancing and careful selection of significant features.


2019 ◽  
Vol 15 (3) ◽  
pp. 206-211 ◽  
Author(s):  
Jihui Tang ◽  
Jie Ning ◽  
Xiaoyan Liu ◽  
Baoming Wu ◽  
Rongfeng Hu

<P>Introduction: Machine Learning is a useful tool for the prediction of cell-penetration compounds as drug candidates. </P><P> Materials and Methods: In this study, we developed a novel method for predicting Cell-Penetrating Peptides (CPPs) membrane penetrating capability. For this, we used orthogonal encoding to encode amino acid and each amino acid position as one variable. Then a software of IBM spss modeler and a dataset including 533 CPPs, were used for model screening. </P><P> Results: The results indicated that the machine learning model of Support Vector Machine (SVM) was suitable for predicting membrane penetrating capability. For improvement, the three CPPs with the most longer lengths were used to predict CPPs. The penetration capability can be predicted with an accuracy of close to 95%. </P><P> Conclusion: All the results indicated that by using amino acid position as a variable can be a perspective method for predicting CPPs membrane penetrating capability.</P>


2020 ◽  
Vol 16 ◽  
Author(s):  
Linqi Liu ◽  
JInhua Luo ◽  
Chenxi Zhao ◽  
Bingxue Zhang ◽  
Wei Fan ◽  
...  

BACKGROUND: Measuring medicinal compounds to evaluate their quality and efficacy has been recognized as a useful approach in treatment. Rhubarb anthraquinones compounds (mainly including aloe-emodin, rhein, emodin, chrysophanol and physcion) are its main effective components as purgating drug. In the current Chinese Pharmacopoeia, the total anthraquinones content is designated as its quantitative quality and control index while the content of each compound has not been specified. METHODS: On the basis of forty rhubarb samples, the correlation models between the near infrared spectra and UPLC analysis data were constructed using support vector machine (SVM) and partial least square (PLS) methods according to Kennard and Stone algorithm for dividing the calibration/prediction datasets. Good models mean they have high correlation coefficients (R2) and low root mean squared error of prediction (RMSEP) values. RESULTS: The models constructed by SVM have much better performance than those by PLS methods. The SVM models have high R2 of 0.8951, 0.9738, 0.9849, 0.9779, 0.9411 and 0.9862 that correspond to aloe-emodin, rhein, emodin, chrysophanol, physcion and total anthraquinones contents, respectively. The corresponding RMSEPs are 0.3592, 0.4182, 0.4508, 0.7121, 0.8365 and 1.7910, respectively. 75% of the predicted results have relative differences being lower than 10%. As for rhein and total anthraquinones, all of the predicted results have relative differences being lower than 10%. CONCLUSION: The nonlinear models constructed by SVM showed good performances with predicted values close to the experimental values. This can perform the rapid determination of the main medicinal ingredients in rhubarb medicinal materials.


Sign in / Sign up

Export Citation Format

Share Document