Generalization of Parameter Selection of SVM and LS-SVM for Regression

Zeng;  Tan;  Matsunaga;  Shirai

doi:10.3390/make1020043

Generalization of Parameter Selection of SVM and LS-SVM for Regression

Machine Learning and Knowledge Extraction ◽

10.3390/make1020043 ◽

2019 ◽

Vol 1 (2) ◽

pp. 745-755 ◽

Cited By ~ 1

Author(s):

Zeng ◽

Tan ◽

Matsunaga ◽

Shirai

Keyword(s):

Nonlinear Function ◽

Co2 Concentration ◽

Parameter Tuning ◽

Least Square ◽

Support Vector ◽

Machine Learning Model ◽

Model Equations ◽

Tuning Methods ◽

Nonlinear Function Approximation ◽

Selection Of

A Support Vector Machine (SVM) for regression is a popular machine learning model that aims to solve nonlinear function approximation problems wherein explicit model equations are difficult to formulate. The performance of an SVM depends largely on the selection of its parameters. Choosing between an SVM that solves an optimization problem with inequality constrains and one that solves the least square of errors (LS-SVM) adds to the complexity. Various methods have been proposed for tuning parameters, but no article puts the SVM and LS-SVM side by side to discuss the issue using a large dataset from the real world, which could be problematic for existing parameter tuning methods. We investigated both the SVM and LS-SVM with an artificial dataset and a dataset of more than 200,000 points used for the reconstruction of the global surface ocean CO2 concentration. The results reveal that: (1) the two models are most sensitive to the parameter of the kernel function, which lies in a narrow range for scaled input data; (2) the optimal values of other parameters do not change much for different datasets; and (3) the LS-SVM performs better than the SVM in general. The LS-SVM is recommended, as it has less parameters to be tuned and yields a smaller bias. Nevertheless, the SVM has advantages of consuming less computer resources and taking less time to train. The results suggest initial parameter guesses for using the models.

Download Full-text

An Improved Multiple Features and Machine Learning-Based Approach for Detecting Clickbait News on Social Networks

Applied Sciences ◽

10.3390/app11209487 ◽

2021 ◽

Vol 11 (20) ◽

pp. 9487

Author(s):

Mohammed Al-Sarem ◽

Faisal Saeed ◽

Zeyad Ghaleb Al-Mekhlafi ◽

Badiea Abdulkarem Mohammed ◽

Mohammed Hadwan ◽

...

Keyword(s):

Machine Learning ◽

Social Networks ◽

Online Social Networks ◽

Parameter Tuning ◽

Arabic Language ◽

Support Vector ◽

Detection Accuracy ◽

F Test ◽

Model Training ◽

Tuning Methods

The widespread usage of social media has led to the increasing popularity of online advertisements, which have been accompanied by a disturbing spread of clickbait headlines. Clickbait dissatisfies users because the article content does not match their expectation. Detecting clickbait posts in online social networks is an important task to fight this issue. Clickbait posts use phrases that are mainly posted to attract a user’s attention in order to click onto a specific fake link/website. That means clickbait headlines utilize misleading titles, which could carry hidden important information from the target website. It is very difficult to recognize these clickbait headlines manually. Therefore, there is a need for an intelligent method to detect clickbait and fake advertisements on social networks. Several machine learning methods have been applied for this detection purpose. However, the obtained performance (accuracy) only reached 87% and still needs to be improved. In addition, most of the existing studies were conducted on English headlines and contents. Few studies focused specifically on detecting clickbait headlines in Arabic. Therefore, this study constructed the first Arabic clickbait headline news dataset and presents an improved multiple feature-based approach for detecting clickbait news on social networks in Arabic language. The proposed approach includes three main phases: data collection, data preparation, and machine learning model training and testing phases. The collected dataset included 54,893 Arabic news items from Twitter (after pre-processing). Among these news items, 23,981 were clickbait news (43.69%) and 30,912 were legitimate news (56.31%). This dataset was pre-processed and then the most important features were selected using the ANOVA F-test. Several machine learning (ML) methods were then applied with hyper-parameter tuning methods to ensure finding the optimal settings. Finally, the ML models were evaluated, and the overall performance is reported in this paper. The experimental results show that the Support Vector Machine (SVM) with the top 10% of ANOVA F-test features (user-based features (UFs) and content-based features (CFs)) obtained the best performance and achieved 92.16% of detection accuracy.

Download Full-text

Research on Nonlinear Modeling Method of Support Vector Machine with Wavelet Derivation Kernel Function

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.687-691.1408 ◽

2014 ◽

Vol 687-691 ◽

pp. 1408-1411

Author(s):

Ping An Wang ◽

Xu Sheng Gan ◽

Wen Ming Gao

Keyword(s):

Support Vector Machine ◽

Kernel Function ◽

Nonlinear Modeling ◽

Nonlinear Function ◽

Modeling Method ◽

Good Effect ◽

Support Vector ◽

Satisfactory Approximation ◽

Wavelet Kernel Function ◽

Nonlinear Function Approximation

The model capability of Support Vector Machine (SVM) relies on the selection of kernel function. To obtain a better application modeling of SVM, the wavelet kernel function that satisfies Merce condition is introduced to use the kernel function of SVM, achieving a good effect. In the paper, on the basis of wavelet kernel function, a wavelet derivation kernel function is proposed in the application of SVM for higher accuracy. An actual example on nonlinear function approximation shows that SVM regression model has a satisfactory approximation effect, and also support an effective nonlinear modeling method.

Download Full-text

Development of Machine Learning Model Using Least Square-Support Vector Machine, Differential Evolution and Back Propagation Neural Network to Detect Breast Cancer

Smart Computing Techniques and Applications - Smart Innovation, Systems and Technologies ◽

10.1007/978-981-16-1502-3_7 ◽

2021 ◽

pp. 51-66

Author(s):

Madhura D. Vankar ◽

G. A. Patil

Keyword(s):

Breast Cancer ◽

Neural Network ◽

Machine Learning ◽

Support Vector Machine ◽

Back Propagation ◽

Back Propagation Neural Network ◽

Least Square ◽

Support Vector ◽

Machine Learning Model ◽

Detect Breast Cancer

Download Full-text

Regularized least square support vector machines for order and structure selection of LPV-ARX models

2016 European Control Conference (ECC) ◽

10.1109/ecc.2016.7810527 ◽

2016 ◽

Cited By ~ 4

Author(s):

Manas Mejari ◽

Dario Piga ◽

Alberto Bemporad

Keyword(s):

Support Vector Machines ◽

Least Square ◽

Support Vector ◽

Structure Selection ◽

Vector Machines ◽

Selection Of

Download Full-text

Managing Student Performance: A Predictive Analytics using Imbalanced Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e7008.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 2277-2283

Keyword(s):

Random Forest ◽

Student Performance ◽

Predictive Models ◽

Predictive Analytics ◽

Random Search ◽

Parameter Tuning ◽

Knowledge Discovery In Databases ◽

Recursive Feature Elimination ◽

Support Vector ◽

Selection Of

Big data has revolutionized every field of life, which accumulates human learning as well. The field of education has progressed in past couple of decades, and addition to that, rapid growth in the number of educational institutions has created a tough competition. The massive accumulation of data in the educational sector has created a great scope of EDM (Educational Data Mining) with the support of robust predictive models. It is quite necessary to regularly examine the performance of the students to make them perform better, thus helps to maintain the reputation of the institution. This study proposed a predictive model through which the performance of the student can be forecasted depending upon various characteristics. The KDD(Knowledge Discovery in Databases) methodology was followed stepwise in this study for developing predictive models to predict student performance. The data balancing techniques such as SMOTE (Synthetic Minority Over-sampling Technique) and ADASYN (Adaptive Synthetic Sampling) were employed to handle the unbalanced effect of data which causes bias predictions. Also, for the selection of significant features techniques, FCBF (Fast Correlation Based Feature selection) and RFE (Recursive Feature Elimination) were used. The EDM algorithms Random Forest (RF), Support Vector Machine (SVM) and Artificial Neural Network (ANN) were utilized for predicting student performance with suitable hyper-parameter tuning using random search to enhance the performance of the model. The results obtained were cross-validated using Ensemble Method and benchmarked with previous studies. The random forest model achieved the highest accuracy of 86% after data balancing and careful selection of significant features.

Download Full-text

A Hybrid Method of Least Square Support Vector Machine and Bacterial Foraging Optimization Algorithm for Medium Term Electricity Price Forecasting

International Journal of Integrated Engineering ◽

10.30880/ijie.2019.11.03.024 ◽

2019 ◽

Vol 11 (3) ◽

Author(s):

Intan Azmira Wan Abdul Razak ◽

◽

Nik Nur Atira Nik Ibrahim ◽

Izham Zainal Abidin ◽

Yap Keem Siah ◽

...

Keyword(s):

Support Vector Machine ◽

Optimization Algorithm ◽

Least Square ◽

Support Vector ◽

Price Forecasting ◽

Medium Term ◽

Bacterial Foraging Optimization ◽

Electricity Price Forecasting ◽

Method Of Least Square ◽

Bacterial Foraging Optimization Algorithm

Download Full-text

Application of nonlinear feature extraction and least square support vector machines for fault diagnosis of chemical process

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.00236 ◽

2010 ◽

Vol 30 (1) ◽

pp. 236-239 ◽

Cited By ~ 1

Author(s):

Liang XU

Keyword(s):

Feature Extraction ◽

Fault Diagnosis ◽

Support Vector Machines ◽

Chemical Process ◽

Least Square ◽

Support Vector ◽

Vector Machines ◽

Nonlinear Feature Extraction ◽

Nonlinear Feature

Download Full-text

A Novel Amino Acid Sequence-based Computational Approach to Predicting Cell-penetrating Peptides

Current Computer - Aided Drug Design ◽

10.2174/1573409914666180925100355 ◽

2019 ◽

Vol 15 (3) ◽

pp. 206-211 ◽

Cited By ~ 2

Author(s):

Jihui Tang ◽

Jie Ning ◽

Xiaoyan Liu ◽

Baoming Wu ◽

Rongfeng Hu

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Amino Acid Position ◽

Cell Penetrating Peptides ◽

Support Vector ◽

Cell Penetration ◽

Drug Candidates ◽

Machine Learning Model ◽

Cell Penetrating ◽

Novel Method

Introduction: Machine Learning is a useful tool for the prediction of cell-penetration compounds as drug candidates. Materials and Methods: In this study, we developed a novel method for predicting Cell-Penetrating Peptides (CPPs) membrane penetrating capability. For this, we used orthogonal encoding to encode amino acid and each amino acid position as one variable. Then a software of IBM spss modeler and a dataset including 533 CPPs, were used for model screening. Results: The results indicated that the machine learning model of Support Vector Machine (SVM) was suitable for predicting membrane penetrating capability. For improvement, the three CPPs with the most longer lengths were used to predict CPPs. The penetration capability can be predicted with an accuracy of close to 95%. Conclusion: All the results indicated that by using amino acid position as a variable can be a perspective method for predicting CPPs membrane penetrating capability.

Download Full-text

Rapid Determining Contents of the Rhubarb Anthraquinones Compounds by Support Vector Machines Modeling based on Near Infrared Spectra

Current Analytical Chemistry ◽

10.2174/1573411016666200317111412 ◽

2020 ◽

Vol 16 ◽

Author(s):

Linqi Liu ◽

JInhua Luo ◽

Chenxi Zhao ◽

Bingxue Zhang ◽

Wei Fan ◽

...

Keyword(s):

Infrared Spectra ◽

Near Infrared ◽

Mean Squared Error ◽

Rapid Determination ◽

Partial Least Square ◽

Least Square ◽

Support Vector ◽

Near Infrared Spectra ◽

Aloe Emodin ◽

Relative Differences

BACKGROUND: Measuring medicinal compounds to evaluate their quality and efficacy has been recognized as a useful approach in treatment. Rhubarb anthraquinones compounds (mainly including aloe-emodin, rhein, emodin, chrysophanol and physcion) are its main effective components as purgating drug. In the current Chinese Pharmacopoeia, the total anthraquinones content is designated as its quantitative quality and control index while the content of each compound has not been specified. METHODS: On the basis of forty rhubarb samples, the correlation models between the near infrared spectra and UPLC analysis data were constructed using support vector machine (SVM) and partial least square (PLS) methods according to Kennard and Stone algorithm for dividing the calibration/prediction datasets. Good models mean they have high correlation coefficients (R2) and low root mean squared error of prediction (RMSEP) values. RESULTS: The models constructed by SVM have much better performance than those by PLS methods. The SVM models have high R2 of 0.8951, 0.9738, 0.9849, 0.9779, 0.9411 and 0.9862 that correspond to aloe-emodin, rhein, emodin, chrysophanol, physcion and total anthraquinones contents, respectively. The corresponding RMSEPs are 0.3592, 0.4182, 0.4508, 0.7121, 0.8365 and 1.7910, respectively. 75% of the predicted results have relative differences being lower than 10%. As for rhein and total anthraquinones, all of the predicted results have relative differences being lower than 10%. CONCLUSION: The nonlinear models constructed by SVM showed good performances with predicted values close to the experimental values. This can perform the rapid determination of the main medicinal ingredients in rhubarb medicinal materials.

Download Full-text