Integrated Feature Selection of ARIMA with Computational Intelligence Approaches for Food Crop Price Prediction

Complexity ◽

10.1155/2018/1910520 ◽

2018 ◽

Vol 2018 ◽

pp. 1-17 ◽

Cited By ~ 2

Author(s):

Yuehjen E. Shao ◽

Jun-Ting Dai

Keyword(s):

Feature Selection ◽

Computational Intelligence ◽

Integrated Model ◽

Multivariate Adaptive Regression Splines ◽

Support Vector ◽

Food Crop ◽

Explanatory Variables ◽

Price Prediction ◽

Forecasting Models ◽

Major Food

Because of global climate change, lack of arable land, and rapid population growth, the supplies of three major food crops (i.e., rice, wheat, and corn) have been gradually decreasing worldwide. The rapid increase in demand for food has contributed to a continuous rise in food prices, which directly threatens the lives of over 800 million people around the world who are reported to be chronically undernourished. Consequently, food crop price prediction has attracted considerable attention in recent years. Recent integrated forecasting models have developed various feature selection methods (FSMs) to capture fewer, but more important, explanatory variables. However, one major problem is that the future values of these important explanatory variables are not available. Thus, predictions based on these variables are not actually possible. Because an autoregressive integrated moving average (ARIMA) can extract important self-predictor variables with future values that can be calculated, this study incorporates an ARIMA as the FSM for computational intelligence (CI) models to predict three major food crop (i.e., rice, wheat, and corn) prices. Other than the ARIMA, the components of the proposed integrated forecasting models include artificial neural networks (ANNs), support vector regression (SVR), and multivariate adaptive regression splines (MARS). The predictive accuracies of ARIMA, ANN, SVR, MARS, and the proposed integrated model are compared and discussed. Experimental results reveal that the proposed integrated model achieves superior forecasting performance for predicting food crop prices.

Download Full-text

Forecasting Credit Ratings of EU Banks

International Journal of Financial Studies ◽

10.3390/ijfs8030049 ◽

2020 ◽

Vol 8 (3) ◽

pp. 49

Author(s):

Vasilios Plakandaras ◽

Periklis Gogas ◽

Theophilos Papadimitriou ◽

Efterpi Doumpa ◽

Maria Stefanidou

Keyword(s):

Credit Rating ◽

Credit Ratings ◽

Support Vector ◽

Credit Rating Agencies ◽

Selection Scheme ◽

Explanatory Variables ◽

Rating Agencies ◽

Financial Variables ◽

Forecasting Models ◽

Banking Institutions

The aim of this study is to forecast credit ratings of E.U. banking institutions, as dictated by Credit Rating Agencies (CRAs). To do so, we developed alternative forecasting models that determine the non-disclosed criteria used in rating. We compiled a sample of 112 E.U. banking institutions, including their Fitch assigned ratings for 2017 and the publicly available information from their corresponding financial statements spanning the period 2013 to 2016, that lead to the corresponding ratings. Our assessment is based on identifying the financial variables that are relevant to forecasting the ratings and the rating methodology used. In the empirical section, we employed a vigorous variable selection scheme prior to training both Probit and Support Vector Machines (SVM) models, given that the latter originates from the area of machine learning and is gaining popularity among economists and CRAs. Our results show that the most accurate, in terms of in-sample forecasting, is an SVM model coupled with the nonlinear RBF kernel that identifies correctly 91.07% of the banks’ ratings, using only 8 explanatory variables. Our findings suggest that a forecasting model based solely on publicly available financial information can adhere closely to the official ratings produced by Fitch. This provides evidence that the actual assessment procedures of the Credit Rating Agencies can be fairly accurately proxied by forecasting models based on freely available data and information on undisclosed information is of lower importance.

Download Full-text

Body Fat Percentage Prediction Using Intelligent Hybrid Approaches

The Scientific World JOURNAL ◽

10.1155/2014/383910 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8 ◽

Cited By ~ 7

Author(s):

Yuehjen E. Shao

Keyword(s):

Body Fat ◽

Hybrid Models ◽

The Body ◽

Single Stage ◽

Support Vector ◽

Body Fat Percentage ◽

Fat Percentage ◽

Explanatory Variables ◽

Forecasting Models ◽

Hybrid Approaches

Excess of body fat often leads to obesity. Obesity is typically associated with serious medical diseases, such as cancer, heart disease, and diabetes. Accordingly, knowing the body fat is an extremely important issue since it affects everyone’s health. Although there are several ways to measure the body fat percentage (BFP), the accurate methods are often associated with hassle and/or high costs. Traditional single-stage approaches may use certain body measurements or explanatory variables to predict the BFP. Diverging from existing approaches, this study proposes new intelligent hybrid approaches to obtain fewer explanatory variables, and the proposed forecasting models are able to effectively predict the BFP. The proposed hybrid models consist of multiple regression (MR), artificial neural network (ANN), multivariate adaptive regression splines (MARS), and support vector regression (SVR) techniques. The first stage of the modeling includes the use of MR and MARS to obtain fewer but more important sets of explanatory variables. In the second stage, the remaining important variables are served as inputs for the other forecasting methods. A real dataset was used to demonstrate the development of the proposed hybrid models. The prediction results revealed that the proposed hybrid schemes outperformed the typical, single-stage forecasting models.

Download Full-text

Electricity Sales Forecasting Using Hybrid Autoregressive Integrated Moving Average and Soft Computing Approaches in the Absence of Explanatory Variables

Energies ◽

10.3390/en11071848 ◽

2018 ◽

Vol 11 (7) ◽

pp. 1848 ◽

Cited By ~ 1

Author(s):

Yuehjen Shao ◽

Yi-Shan Tsai

Keyword(s):

Moving Average ◽

Multivariate Adaptive Regression Splines ◽

Predictor Variables ◽

Forecasting Accuracy ◽

Autoregressive Integrated Moving Average ◽

Explanatory Variables ◽

Forecasting Models ◽

Hybrid Approaches ◽

The Future ◽

Hybrid Forecasting

Electricity is important because it is the most common energy source that we consume and depend on in our everyday lives. Consequently, the forecasting of electricity sales is essential. Typical forecasting approaches often generate electricity sales forecasts based on certain explanatory variables. However, these forecasting approaches are limited by the fact that future explanatory variables are unknown. To improve forecasting accuracy, recent hybrid forecasting approaches have developed different feature selection techniques (FSTs) to obtain fewer but more significant explanatory variables. However, these significant explanatory variables will still not be available in the future, despite being screened by effective FSTs. This study proposes the autoregressive integrated moving average (ARIMA) technique to serve as the FST for hybrid forecasting models. Aside from the ARIMA element, the proposed hybrid models also include artificial neural networks (ANN) and multivariate adaptive regression splines (MARS) because of their efficient and fast algorithms and effective forecasting performance. ARIMA can identify significant self-predictor variables that will be available in the future. The significant self-predictor variables obtained can then serve as the inputs for ANN and MARS models. These hybrid approaches have been seldom investigated on the electricity sales forecasting. This study proposes several forecasting models that do not require explanatory variables to forecast the industrial electricity, residential electricity, and commercial electricity sales in Taiwan. The experimental results reveal that the significant self-predictor variables obtained from ARIMA can improve the forecasting accuracy of ANN and MARS models.

Download Full-text

A Scalable Memetic Algorithm for Simultaneous Instance and Feature Selection

Evolutionary Computation ◽

10.1162/evco_a_00102 ◽

2014 ◽

Vol 22 (1) ◽

pp. 1-45 ◽

Cited By ~ 20

Author(s):

Nicolás García-Pedrajas ◽

Aida de Haro-García ◽

Javier Pérez-Rodríguez

Keyword(s):

Feature Selection ◽

Memetic Algorithm ◽

Fitness Function ◽

Large Datasets ◽

Support Vector ◽

Instance Selection ◽

Huge Amount ◽

Explanatory Variables ◽

Vector Machines ◽

Testing Error

Instance selection is becoming increasingly relevant due to the huge amount of data that is constantly produced in many fields of research. At the same time, most of the recent pattern recognition problems involve highly complex datasets with a large number of possible explanatory variables. For many reasons, this abundance of variables significantly harms classification or recognition tasks. There are efficiency issues, too, because the speed of many classification algorithms is largely improved when the complexity of the data is reduced. One of the approaches to address problems that have too many features or instances is feature or instance selection, respectively. Although most methods address instance and feature selection separately, both problems are interwoven, and benefits are expected from facing these two tasks jointly. This paper proposes a new memetic algorithm for dealing with many instances and many features simultaneously by performing joint instance and feature selection. The proposed method performs four different local search procedures with the aim of obtaining the most relevant subsets of instances and features to perform an accurate classification. A new fitness function is also proposed that enforces instance selection but avoids putting too much pressure on removing features. We prove experimentally that this fitness function improves the results in terms of testing error. Regarding the scalability of the method, an extension of the stratification approach is developed for simultaneous instance and feature selection. This extension allows the application of the proposed algorithm to large datasets. An extensive comparison using 55 medium to large datasets from the UCI Machine Learning Repository shows the usefulness of our method. Additionally, the method is applied to 30 large problems, with very good results. The accuracy of the method for class-imbalanced problems in a set of 40 datasets is shown. The usefulness of the method is also tested using decision trees and support vector machines as classification methods.

Download Full-text

An improved short term load forecasting with ranker based feature selection technique

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-191568 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6783-6800

Author(s):

Siva Sankari Subbiah ◽

Jayakumar Chinnappan

Keyword(s):

Feature Selection ◽

Short Term Memory ◽

Load Forecasting ◽

Support Vector ◽

Short Term ◽

Feature Selection Technique ◽

Forecasting Models ◽

Short Term Load Forecasting ◽

Utility Companies ◽

Electricity Load

The load forecasting is the significant task carried out by the electricity providing utility companies for estimating the future electricity load. The proper planning, scheduling, functioning, and maintenance of the power system rely on the accurate forecasting of the electricity load. In this paper, the clustering-based filter feature selection is proposed for assisting the forecasting models in improving the short term load forecasting performance. The Recurrent Neural Network based Long Short Term Memory (LSTM) is developed for forecasting the short term load and compared against Multilayer Perceptron (MLP), Radial Basis Function (RBF), Support Vector Regression (SVR) and Random Forest (RF). The performance of the forecasting model is improved by reducing the curse of dimensionality using filter feature selection such as Fast Correlation Based Filter (FCBF), Mutual Information (MI), and RReliefF. The clustering is utilized to group the similar load patterns and eliminate the outliers. The feature selection identifies the relevant features related to the load by taking samples from each cluster. To show the generality, the proposed model is experimented by using two different datasets from European countries. The result shows that the forecasting models with selected features produce better performance especially the LSTM with RReliefF outperformed other models.

Download Full-text

Analysis of Sentiment of Moving a National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i3.1942 ◽

2020 ◽

Vol 4 (3) ◽

pp. 504-512

Author(s):

Faried Zamachsari ◽

Gabriel Vangeran Saragih ◽

Susafa'ati ◽

Windu Gata

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Feature Selection ◽

Public Opinion ◽

Naive Bayes ◽

Naïve Bayes ◽

Capital City ◽

Support Vector ◽

National Capital ◽

Bayes Algorithm

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.

Download Full-text

Identification of Chronic Hypersensitivity Pneumonitis Biomarkers with Machine Learning and Differential Co-expression Analysis

Current Gene Therapy ◽

10.2174/1566523220666201208093325 ◽

2020 ◽

Vol 20 ◽

Author(s):

Hongwei Zhang ◽

Steven Wang ◽

Tao Huang

Keyword(s):

Feature Selection ◽

Expression Analysis ◽

Hypersensitivity Pneumonitis ◽

Enrichment Analysis ◽

Functional Enrichment ◽

Great Promise ◽

Support Vector ◽

Svm Classifier ◽

Clinical Tool ◽

Chronic Hypersensitivity Pneumonitis

Aims: We would like to identify the biomarkers for chronic hypersensitivity pneumonitis (CHP) and facilitate the precise gene therapy of CHP. Background: Chronic hypersensitivity pneumonitis (CHP) is an interstitial lung disease caused by hypersensitive reactions to inhaled antigens. Clinically, the tasks of differentiating between CHP and other interstitial lungs diseases, especially idiopathic pulmonary fibrosis (IPF), were challenging. Objective: In this study, we analyzed the public available gene expression profile of 82 CHP patients, 103 IPF patients, and 103 control samples to identify the CHP biomarkers. Method: The CHP biomarkers were selected with advanced feature selection methods: Monte Carlo Feature Selection (MCFS) and Incremental Feature Selection (IFS). A Support Vector Machine (SVM) classifier was built. Then, we analyzed these CHP biomarkers through functional enrichment analysis and differential co-expression analysis. Result: There were 674 identified CHP biomarkers. The co-expression network of these biomarkers in CHP included more negative regulations and the network structure of CHP was quite different from the network of IPF and control. Conclusion: The SVM classifier may serve as an important clinical tool to address the challenging task of differentiating between CHP and IPF. Many of the biomarker genes on the differential co-expression network showed great promise in revealing the underlying mechanisms of CHP.

Download Full-text

An Improved Intelligent Approach to Enhance the Sentiment Classifier for Knowledge Discovery Using Machine Learning

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327910999200528114552 ◽

2020 ◽

Vol 10 (4) ◽

pp. 582-593

Author(s):

Midde Venkateswarlu Naik ◽

D. Vasumathi ◽

A.P. Siva Kumar

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Global Warming ◽

Particle Swarm Optimization ◽

Sentiment Analysis ◽

Optimization Technique ◽

Particle Swarm ◽

Sentiment Classification ◽

Support Vector ◽

Swarm Optimization

Aims: The proposed research work is on an evolutionary enhanced method for sentiment or emotion classification on unstructured review text in the big data field. The sentiment analysis plays a vital role for current generation of people for extracting valid decision points about any aspect such as movie ratings, education institute or politics ratings, etc. The proposed hybrid approach combined the optimal feature selection using Particle Swarm Optimization (PSO) and sentiment classification through Support Vector Machine (SVM). The current approach performance is evaluated with statistical measures, such as precision, recall, sensitivity, specificity, and was compared with the existing approaches. The earlier authors have achieved an accuracy of sentiment classifier in the English text up to 94% as of now. In the proposed scheme, an average accuracy of sentiment classifier on distinguishing datasets outperformed as 99% by tuning various parameters of SVM, such as constant c value and kernel gamma value in association with PSO optimization technique. The proposed method utilized three datasets, such as airline sentiment data, weather, and global warming datasets, that are publically available. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Background: The sentiment analysis plays a vital role for current generation people for extracting valid decisions about any aspect such as movie rating, education institute or even politics ratings, etc. Sentiment Analysis (SA) or opinion mining has become fascinated scientifically as a research domain for the present environment. The key area is sentiment classification on semi-structured or unstructured data in distinguish languages, which has become a major research aspect. User-Generated Content [UGC] from distinguishing sources has been hiked significantly with rapid growth in a web environment. The huge user-generated data over social media provides substantial value for discovering hidden knowledge or correlations, patterns, and trends or sentiment extraction about any specific entity. SA is a computational analysis to determine the actual opinion of an entity which is expressed in terms of text. SA is also called as computation of emotional polarity expressed over social media as natural text in miscellaneous languages. Usually, the automatic superlative sentiment classifier model depends on feature selection and classification algorithms. Methods: The proposed work used Support vector machine as classification technique and particle swarm optimization technique as feature selection purpose. In this methodology, we tune various permutations and combination parameters in order to obtain expected desired results with kernel and without kernel technique for sentiment classification on three datasets, including airline, global warming, weather sentiment datasets, that are freely hosted for research practices. Results: In the proposed scheme, The proposed method has outperformed with 99.2% of average accuracy to classify the sentiment on different datasets, among other machine learning techniques. The attained high accuracy in classifying sentiment or opinion about review text proves superior effectiveness over existing sentiment classifiers. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Conclusion: The objective of the research issue sentiment classifier accuracy has been hiked with the help of Kernel-based Support Vector Machine (SVM) based on parameter optimization. The optimal feature selection to classify sentiment or opinion towards review documents has been determined with the help of a particle swarm optimization approach. The proposed method utilized three datasets to simulate the results, such as airline sentiment data, weather sentiment data, and global warming data that are freely available datasets.

Download Full-text

Ensemble based filter feature selection with harmonize particle swarm optimization and support vector machine for optimal cancer classification

Machine Learning with Applications ◽

10.1016/j.mlwa.2021.100054 ◽

2021 ◽

pp. 100054

Author(s):

Tengku Mazlin Tengku Ab Hamid ◽

Roselina Sallehuddin ◽

Zuriahati Mohd Yunos ◽

Aida Ali

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Particle Swarm Optimization ◽

Particle Swarm ◽

Cancer Classification ◽

Support Vector ◽

Swarm Optimization

Download Full-text

A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-190134 ◽

2021 ◽

Vol 24 (4) ◽

pp. 289-301

Author(s):

B. Venkatesh ◽

J. Anuradha

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Classification Accuracy ◽

Performance Metrics ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Binary Particle Swarm Optimization ◽

Selection Methods

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.

Download Full-text