Training and Testing Data Division Influence on Hybrid Machine Learning Model Process: Application of River Flow Forecasting

Complexity ◽

10.1155/2020/8844367 ◽

2020 ◽

Vol 2020 ◽

pp. 1-22

Author(s):

Hai Tao ◽

Ali Omran Al-Sulttani ◽

Ameen Mohammed Salih Ameen ◽

Zainab Hasan Ali ◽

Nadhir Al-Ansari ◽

...

Keyword(s):

Machine Learning ◽

River Flow ◽

The Other ◽

Semiarid Region ◽

Support Vector ◽

Hydrological Process ◽

Support Vector Regression Model ◽

River Flow Forecasting ◽

Data Division ◽

Complex Phenomena

The hydrological process has a dynamic nature characterised by randomness and complex phenomena. The application of machine learning (ML) models in forecasting river flow has grown rapidly. This is owing to their capacity to simulate the complex phenomena associated with hydrological and environmental processes. Four different ML models were developed for river flow forecasting located in semiarid region, Iraq. The effectiveness of data division influence on the ML models process was investigated. Three data division modeling scenarios were inspected including 70%–30%, 80%–20, and 90%–10%. Several statistical indicators are computed to verify the performance of the models. The results revealed the potential of the hybridized support vector regression model with a genetic algorithm (SVR-GA) over the other ML forecasting models for monthly river flow forecasting using 90%–10% data division. In addition, it was found to improve the accuracy in forecasting high flow events. The unique architecture of developed SVR-GA due to the ability of the GA optimizer to tune the internal parameters of the SVR model provides a robust learning process. This has made it more efficient in forecasting stochastic river flow behaviour compared to the other developed hybrid models.

Download Full-text

Analysis of the Nosema Cells Identification for Microscopic Images

Sensors ◽

10.3390/s21093068 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3068

Author(s):

Soumaya Dghim ◽

Carlos M. Travieso-González ◽

Radim Burget

Keyword(s):

Neural Network ◽

Machine Learning ◽

Image Processing ◽

Deep Learning ◽

The Other ◽

Support Vector ◽

Learning Approaches ◽

Microscopic Images ◽

Trained Neural Network ◽

Nosema Disease

The use of image processing tools, machine learning, and deep learning approaches has become very useful and robust in recent years. This paper introduces the detection of the Nosema disease, which is considered to be one of the most economically significant diseases today. This work shows a solution for recognizing and identifying Nosema cells between the other existing objects in the microscopic image. Two main strategies are examined. The first strategy uses image processing tools to extract the most valuable information and features from the dataset of microscopic images. Then, machine learning methods are applied, such as a neural network (ANN) and support vector machine (SVM) for detecting and classifying the Nosema disease cells. The second strategy explores deep learning and transfers learning. Several approaches were examined, including a convolutional neural network (CNN) classifier and several methods of transfer learning (AlexNet, VGG-16 and VGG-19), which were fine-tuned and applied to the object sub-images in order to identify the Nosema images from the other object images. The best accuracy was reached by the VGG-16 pre-trained neural network with 96.25%.

Download Full-text

Application of Various Machine Learning Techniques in Predicting Total Organic Carbon from Well Logs

Computational Intelligence and Neuroscience ◽

10.1155/2021/7390055 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Osama Siddig ◽

Ahmed Farid Ibrahim ◽

Salaheldin Elkatatny

Keyword(s):

Machine Learning ◽

Organic Carbon ◽

Total Organic Carbon ◽

The Other ◽

Well Logs ◽

Machine Learning Techniques ◽

Percentage Error ◽

Average Error ◽

Support Vector ◽

Empirical Correlations

Unconventional resources have recently gained a lot of attention, and as a consequence, there has been an increase in research interest in predicting total organic carbon (TOC) as a crucial quality indicator. TOC is commonly measured experimentally; however, due to sampling restrictions, obtaining continuous data on TOC is difficult. Therefore, different empirical correlations for TOC have been presented. However, there are concerns about the generalization and accuracy of these correlations. In this paper, different machine learning (ML) techniques were utilized to develop models that predict TOC from well logs, including formation resistivity (FR), spontaneous potential (SP), sonic transit time (Δt), bulk density (RHOB), neutron porosity (CNP), gamma ray (GR), and spectrum logs of thorium (Th), uranium (Ur), and potassium (K). Over 1250 data points from the Devonian Duvernay shale were utilized to create and validate the model. These datasets were obtained from three wells; the first was used to train the models, while the data sets from the other two wells were utilized to test and validate them. Support vector machine (SVM), random forest (RF), and decision tree (DT) were the ML approaches tested, and their predictions were contrasted with three empirical correlations. Various AI methods’ parameters were tested to assure the best possible accuracy in terms of correlation coefficient (R) and average absolute percentage error (AAPE) between the actual and predicted TOC. The three ML methods yielded good matches; however, the RF-based model has the best performance. The RF model was able to predict the TOC for the different datasets with R values range between 0.93 and 0.99 and AAPE values less than 14%. In terms of average error, the ML-based models outperformed the other three empirical correlations. This study shows the capability and robustness of ML models to predict the total organic carbon from readily available logging data without the need for core analysis or additional well interventions.

Download Full-text

Monthly river flow forecasting using artificial neural network and support vector regression models coupled with wavelet transform

Computers & Geosciences ◽

10.1016/j.cageo.2012.11.015 ◽

2013 ◽

Vol 54 ◽

pp. 1-8 ◽

Cited By ~ 103

Author(s):

Aman Mohammad Kalteh

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Wavelet Transform ◽

Support Vector Regression ◽

Regression Models ◽

River Flow ◽

Support Vector ◽

River Flow Forecasting ◽

Artificial Neural

Download Full-text

Hydrologic data exploration and river flow forecasting using self-organizing map and support vector regression

The Fourth International Workshop on Advanced Computational Intelligence ◽

10.1109/iwaci.2011.6160029 ◽

2011 ◽

Cited By ~ 1

Author(s):

Mutao Huang ◽

Yong Tian

Keyword(s):

Support Vector Regression ◽

River Flow ◽

Data Exploration ◽

Support Vector ◽

Self Organizing Map ◽

River Flow Forecasting ◽

Hydrologic Data ◽

Self Organizing

Download Full-text

Support Vecztor Regression Integrated with Fruit Fly Optimization Algorithm for River Flow Forecasting in Lake Urmia Basin

10.20944/preprints201905.0320.v1 ◽

2019 ◽

Author(s):

Saeed Samadianfard ◽

Salar Jarhan ◽

Ely Salwana ◽

Amir Mosavi ◽

Shahaboddin Shamshirband ◽

...

Keyword(s):

Optimization Algorithm ◽

River Flow ◽

Mean Squared Error ◽

Fruit Fly ◽

Support Vector ◽

Fruit Fly Optimization Algorithm ◽

Lake Urmia ◽

River Flows ◽

Fruit Fly Optimization ◽

River Flow Forecasting

Adequate knowledge about the development and operation of the components of water systems is of high importance in order to optimize them. For this reason, forecasting of future events becomes greatly significant due to making the appropriate decision. Moreover, operational river management severely depends on accurate and reliable flow forecasts. In this regard, current study inspects the accuracy of support vector regression (SVR), and SVR regulated with fruit fly optimization algorithm (FOASVR) and M5 model tree (M5), in river flow forecasting. Monthly data of river flow in two stations of the Lake Urmia Basin (Vaniar and Babarud stations on the Aji Chay and the Barandouz Rivers) were utilized in the current research. Additionally, the influence of periodicity (π) on the forecasting enactment was examined. To assess the performance of mentioned models, different statistical meters were implemented, including root mean squared error (RMSE), mean absolute error (MAE), correlation coefficient (R), and Bayesian information criterion (BIC). Results showed that the FOASVR with RMSE (4.36 and 6.33 m3/s), MAE (2.40 and 3.71 m3/s) and R (0.82 and 0.81) values had the best performances in forecasting river flows in Babarud and Vaniar stations, respectively. Also, regarding BIC parameters, Qt-1 and π were selected as parsimonious inputs for predicting river flow one month ahead. Overall findings indicated that, although both FOASVR and M5 predicted the river flows in suitable accordance with observed river flows, the performance of FOASVR was moderately better than the M5 and periodicity noticeably increased the performances of the models; consequently, FOASVR can be suggested as the accurate method for forecasting river flows.

Download Full-text

Calibration of SWAT and three data-driven models for monthly stream flow simulation in Maharlu Lake Basin

Water Science & Technology Water Supply ◽

10.2166/ws.2021.175 ◽

2021 ◽

Author(s):

Fatemeh Moazami Goudarzi ◽

Amirpouya Sarraf ◽

Hassan Ahmadi

Keyword(s):

Machine Learning ◽

Support Vector Regression ◽

Computational Intelligence ◽

Stream Flow ◽

River Flow ◽

Swat Model ◽

Flow Simulation ◽

Support Vector ◽

Lake Basin ◽

Computational Intelligence Methods

Abstract In this study, the performance of SWAT hydrological model and three computational intelligence methods used to simulate river flow are investigated. After collecting the data required for all models used, the calibration and validation stages were performed. Using the SWAT model and three methods of the Extreme Machine Learning (EML), the Support Vector Regression (SVR), and the Least Squares Support Vector Regression (LSSVR), Maharlu Lake Basin stream flow was simulated and the results obtained at Shiraz station were used for this study. A noise reduction filter was employed to improve the results from the computational intelligence methods, and SUFI-2 algorithm was used to analyze the uncertainty of the SWAT model. Finally, in order to evaluate the models developed and the SWAT model, three statistics (RMSE), (R²), and (NS) coefficient were used. The results indicated that the SWAT model and the machine learning models were generally appropriate tools for daily flow modeling, but the LSSVR model showed less errors in both learning and testing, with the coefficients NS = 0.997 and R² = 0.997 in the calibration stage and NS = 0.994 and R² = 0.994 in the validation stage, which prove their better performance compared to the other methods and the SWAT model.

Download Full-text

A Systematic Methodology to Evaluate Prediction Models for Driving Style Classification

Sensors ◽

10.3390/s20061692 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1692 ◽

Cited By ~ 6

Author(s):

Iván Silva ◽

José Eugenio Naranjo

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Performance Metrics ◽

Prediction Models ◽

Statistical Tests ◽

Area Under The Curve ◽

The Other ◽

Support Vector ◽

Classification Models ◽

K Nearest Neighbor

Identifying driving styles using classification models with in-vehicle data can provide automated feedback to drivers on their driving behavior, particularly if they are driving safely. Although several classification models have been developed for this purpose, there is no consensus on which classifier performs better at identifying driving styles. Therefore, more research is needed to evaluate classification models by comparing performance metrics. In this paper, a data-driven machine-learning methodology for classifying driving styles is introduced. This methodology is grounded in well-established machine-learning (ML) methods and literature related to driving-styles research. The methodology is illustrated through a study involving data collected from 50 drivers from two different cities in a naturalistic setting. Five features were extracted from the raw data. Fifteen experts were involved in the data labeling to derive the ground truth of the dataset. The dataset fed five different models (Support Vector Machines (SVM), Artificial Neural Networks (ANN), fuzzy logic, k-Nearest Neighbor (kNN), and Random Forests (RF)). These models were evaluated in terms of a set of performance metrics and statistical tests. The experimental results from performance metrics showed that SVM outperformed the other four models, achieving an average accuracy of 0.96, F1-Score of 0.9595, Area Under the Curve (AUC) of 0.9730, and Kappa of 0.9375. In addition, Wilcoxon tests indicated that ANN predicts differently to the other four models. These promising results demonstrate that the proposed methodology may support researchers in making informed decisions about which ML model performs better for driving-styles classification.

Download Full-text

A Comparison of Machine Learning Methods for the Prediction of Traffic Speed in Urban Places

Sustainability ◽

10.3390/su12010142 ◽

2019 ◽

Vol 12 (1) ◽

pp. 142 ◽

Cited By ~ 5

Author(s):

Charalampos Bratsas ◽

Kleanthis Koupidis ◽

Josep-Maria Salanova ◽

Konstantinos Giannakopoulos ◽

Aristeidis Kaloudis ◽

...

Keyword(s):

Machine Learning ◽

Support Vector Regression ◽

Multilayer Perceptron ◽

Intelligent Transportation Systems ◽

Traffic Congestion ◽

Transportation Systems ◽

Support Vector ◽

Support Vector Regression Model ◽

Traffic Lights ◽

Stable Conditions

Rising interest in the field of Intelligent Transportation Systems combined with the increased availability of collected data allows the study of different methods for prevention of traffic congestion in cities. A common need in all of these methods is the use of traffic predictions for supporting planning and operation of the traffic lights and traffic management schemes. This paper focuses on comparing the forecasting effectiveness of three machine learning models, namely Random Forests, Support Vector Regression, and Multilayer Perceptron—in addition to Multiple Linear Regression—using probe data collected from the road network of Thessaloniki, Greece. The comparison was conducted with multiple tests clustered in three types of scenarios. The first scenario tests the algorithms on specific randomly selected dates on different randomly selected roads. The second scenario tests the algorithms on randomly selected roads over eight consecutive 15 min intervals; the third scenario tests the algorithms on random roads for the duration of a whole day. The experimental results show that while the Support Vector Regression model performs best at stable conditions with minor variations, the Multilayer Perceptron model adapts better to circumstances with greater variations, in addition to having the most near-zero errors.

Download Full-text

MULTIPLE AUTHORS DETECTION: A QUANTITATIVE ANALYSIS OF DREAM OF THE RED CHAMBER

Advances in Adaptive Data Analysis ◽

10.1142/s1793536914500125 ◽

2014 ◽

Vol 06 (04) ◽

pp. 1450012 ◽

Cited By ~ 4

Author(s):

XIANFENG HU ◽

YANG WANG ◽

QIANG WU

Keyword(s):

Machine Learning ◽

Quantitative Analysis ◽

Relative Frequency ◽

Strong Support ◽

The Other ◽

Authorship Attribution ◽

Support Vector ◽

Feature Ranking ◽

Vector Machines ◽

Cao Xueqin

Inspired by the authorship controversy of Dream of the Red Chamber and the application of machine learning in the study of literary stylometry, we develop a rigorous new method for the mathematical analysis of authorship by testing for a so-called chrono-divide in writing styles. Our method incorporates some of the latest advances in the study of authorship attribution, particularly techniques from support vector machines. By introducing the notion of relative frequency as a feature ranking metric, our method proves to be highly effective and robust. Applying our method to the Cheng–Gao version of Dream of the Red Chamber has led to convincing if not irrefutable evidence that the first 80 chapters and the last 40 chapters of the book were written by two different authors. Furthermore, our analysis has unexpectedly provided strong support to the hypothesis that Chapter 67 was not the work of Cao Xueqin either. We have also tested our method to the other three Great Classical Novels in Chinese. As expected no chrono-divides have been found. This provides further evidence of the robustness of our method.

Download Full-text

Spam text classification using LSTM Recurrent Neural Network

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2021/11992021 ◽

2021 ◽

Vol 9 (9) ◽

pp. 1271-1275

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Language Processing ◽

Text Classification ◽

Short Term Memory ◽

Experimental Studies ◽

The Other ◽

Support Vector ◽

Data Points ◽

Class Labels

Sequence Classification is one of the on-demand research projects in the field of Natural Language Processing (NLP). Classifying a set of images or text into an appropriate category or class is a complex task that a lot of Machine Learning (ML) models fail to accomplish accurately and end up under-fitting the given dataset. Some of the ML algorithms used in text classification are KNN, Naïve Bayes, Support Vector Machines, Convolutional Neural Networks (CNNs), Recursive CNNs, Recurrent Neural Networks (RNNs), Long Short Term Memory (LSTM), etc. For this experimental study, LSTM and a few other algorithms were chosen for a more comparative study. The dataset used is the SMS Spam Collection Dataset from Kaggle and 150 more entries were additionally added from different sources. Two possible class labels for the data points are spam and ham. Each entry consists of the class label, a few sentences of text followed by a few useless features that are eliminated. After converting the text to the required format, the models are run and then evaluated using various metrics. In experimental studies, the LSTM gives much better classification accuracy than the other machine learning models. F1-Scores in the high nineties were achieved using LSTM for classifying the text. The other models showed very low F1-Scores and Cosine Similarities indicating that they had underperformed on the dataset. Another interesting observation is that the LSTM had reduced the number of false positives and false negatives than any other model.

Download Full-text