scholarly journals Training and Testing Data Division Influence on Hybrid Machine Learning Model Process: Application of River Flow Forecasting

Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-22
Author(s):  
Hai Tao ◽  
Ali Omran Al-Sulttani ◽  
Ameen Mohammed Salih Ameen ◽  
Zainab Hasan Ali ◽  
Nadhir Al-Ansari ◽  
...  

The hydrological process has a dynamic nature characterised by randomness and complex phenomena. The application of machine learning (ML) models in forecasting river flow has grown rapidly. This is owing to their capacity to simulate the complex phenomena associated with hydrological and environmental processes. Four different ML models were developed for river flow forecasting located in semiarid region, Iraq. The effectiveness of data division influence on the ML models process was investigated. Three data division modeling scenarios were inspected including 70%–30%, 80%–20, and 90%–10%. Several statistical indicators are computed to verify the performance of the models. The results revealed the potential of the hybridized support vector regression model with a genetic algorithm (SVR-GA) over the other ML forecasting models for monthly river flow forecasting using 90%–10% data division. In addition, it was found to improve the accuracy in forecasting high flow events. The unique architecture of developed SVR-GA due to the ability of the GA optimizer to tune the internal parameters of the SVR model provides a robust learning process. This has made it more efficient in forecasting stochastic river flow behaviour compared to the other developed hybrid models.

Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3068
Author(s):  
Soumaya Dghim ◽  
Carlos M. Travieso-González ◽  
Radim Burget

The use of image processing tools, machine learning, and deep learning approaches has become very useful and robust in recent years. This paper introduces the detection of the Nosema disease, which is considered to be one of the most economically significant diseases today. This work shows a solution for recognizing and identifying Nosema cells between the other existing objects in the microscopic image. Two main strategies are examined. The first strategy uses image processing tools to extract the most valuable information and features from the dataset of microscopic images. Then, machine learning methods are applied, such as a neural network (ANN) and support vector machine (SVM) for detecting and classifying the Nosema disease cells. The second strategy explores deep learning and transfers learning. Several approaches were examined, including a convolutional neural network (CNN) classifier and several methods of transfer learning (AlexNet, VGG-16 and VGG-19), which were fine-tuned and applied to the object sub-images in order to identify the Nosema images from the other object images. The best accuracy was reached by the VGG-16 pre-trained neural network with 96.25%.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Osama Siddig ◽  
Ahmed Farid Ibrahim ◽  
Salaheldin Elkatatny

Unconventional resources have recently gained a lot of attention, and as a consequence, there has been an increase in research interest in predicting total organic carbon (TOC) as a crucial quality indicator. TOC is commonly measured experimentally; however, due to sampling restrictions, obtaining continuous data on TOC is difficult. Therefore, different empirical correlations for TOC have been presented. However, there are concerns about the generalization and accuracy of these correlations. In this paper, different machine learning (ML) techniques were utilized to develop models that predict TOC from well logs, including formation resistivity (FR), spontaneous potential (SP), sonic transit time (Δt), bulk density (RHOB), neutron porosity (CNP), gamma ray (GR), and spectrum logs of thorium (Th), uranium (Ur), and potassium (K). Over 1250 data points from the Devonian Duvernay shale were utilized to create and validate the model. These datasets were obtained from three wells; the first was used to train the models, while the data sets from the other two wells were utilized to test and validate them. Support vector machine (SVM), random forest (RF), and decision tree (DT) were the ML approaches tested, and their predictions were contrasted with three empirical correlations. Various AI methods’ parameters were tested to assure the best possible accuracy in terms of correlation coefficient (R) and average absolute percentage error (AAPE) between the actual and predicted TOC. The three ML methods yielded good matches; however, the RF-based model has the best performance. The RF model was able to predict the TOC for the different datasets with R values range between 0.93 and 0.99 and AAPE values less than 14%. In terms of average error, the ML-based models outperformed the other three empirical correlations. This study shows the capability and robustness of ML models to predict the total organic carbon from readily available logging data without the need for core analysis or additional well interventions.


Author(s):  
Saeed Samadianfard ◽  
Salar Jarhan ◽  
Ely Salwana ◽  
Amir Mosavi ◽  
Shahaboddin Shamshirband ◽  
...  

Adequate knowledge about the development and operation of the components of water systems is of high importance in order to optimize them. For this reason, forecasting of future events becomes greatly significant due to making the appropriate decision. Moreover, operational river management severely depends on accurate and reliable flow forecasts. In this regard, current study inspects the accuracy of support vector regression (SVR), and SVR regulated with fruit fly optimization algorithm (FOASVR) and M5 model tree (M5), in river flow forecasting. Monthly data of river flow in two stations of the Lake Urmia Basin (Vaniar and Babarud stations on the Aji Chay and the Barandouz Rivers) were utilized in the current research. Additionally, the influence of periodicity (π) on the forecasting enactment was examined. To assess the performance of mentioned models, different statistical meters were implemented, including root mean squared error (RMSE), mean absolute error (MAE), correlation coefficient (R), and Bayesian information criterion (BIC). Results showed that the FOASVR with RMSE (4.36 and 6.33 m3/s), MAE (2.40 and 3.71 m3/s) and R (0.82 and 0.81) values had the best performances in forecasting river flows in Babarud and Vaniar stations, respectively. Also, regarding BIC parameters, Qt-1 and π were selected as parsimonious inputs for predicting river flow one month ahead. Overall findings indicated that, although both FOASVR and M5 predicted the river flows in suitable accordance with observed river flows, the performance of FOASVR was moderately better than the M5 and periodicity noticeably increased the performances of the models; consequently, FOASVR can be suggested as the accurate method for forecasting river flows.


Author(s):  
Fatemeh Moazami Goudarzi ◽  
Amirpouya Sarraf ◽  
Hassan Ahmadi

Abstract In this study, the performance of SWAT hydrological model and three computational intelligence methods used to simulate river flow are investigated. After collecting the data required for all models used, the calibration and validation stages were performed. Using the SWAT model and three methods of the Extreme Machine Learning (EML), the Support Vector Regression (SVR), and the Least Squares Support Vector Regression (LSSVR), Maharlu Lake Basin stream flow was simulated and the results obtained at Shiraz station were used for this study. A noise reduction filter was employed to improve the results from the computational intelligence methods, and SUFI-2 algorithm was used to analyze the uncertainty of the SWAT model. Finally, in order to evaluate the models developed and the SWAT model, three statistics (RMSE), (R²), and (NS) coefficient were used. The results indicated that the SWAT model and the machine learning models were generally appropriate tools for daily flow modeling, but the LSSVR model showed less errors in both learning and testing, with the coefficients NS = 0.997 and R² = 0.997 in the calibration stage and NS = 0.994 and R² = 0.994 in the validation stage, which prove their better performance compared to the other methods and the SWAT model.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1692 ◽  
Author(s):  
Iván Silva ◽  
José Eugenio Naranjo

Identifying driving styles using classification models with in-vehicle data can provide automated feedback to drivers on their driving behavior, particularly if they are driving safely. Although several classification models have been developed for this purpose, there is no consensus on which classifier performs better at identifying driving styles. Therefore, more research is needed to evaluate classification models by comparing performance metrics. In this paper, a data-driven machine-learning methodology for classifying driving styles is introduced. This methodology is grounded in well-established machine-learning (ML) methods and literature related to driving-styles research. The methodology is illustrated through a study involving data collected from 50 drivers from two different cities in a naturalistic setting. Five features were extracted from the raw data. Fifteen experts were involved in the data labeling to derive the ground truth of the dataset. The dataset fed five different models (Support Vector Machines (SVM), Artificial Neural Networks (ANN), fuzzy logic, k-Nearest Neighbor (kNN), and Random Forests (RF)). These models were evaluated in terms of a set of performance metrics and statistical tests. The experimental results from performance metrics showed that SVM outperformed the other four models, achieving an average accuracy of 0.96, F1-Score of 0.9595, Area Under the Curve (AUC) of 0.9730, and Kappa of 0.9375. In addition, Wilcoxon tests indicated that ANN predicts differently to the other four models. These promising results demonstrate that the proposed methodology may support researchers in making informed decisions about which ML model performs better for driving-styles classification.


2019 ◽  
Vol 12 (1) ◽  
pp. 142 ◽  
Author(s):  
Charalampos Bratsas ◽  
Kleanthis Koupidis ◽  
Josep-Maria Salanova ◽  
Konstantinos Giannakopoulos ◽  
Aristeidis Kaloudis ◽  
...  

Rising interest in the field of Intelligent Transportation Systems combined with the increased availability of collected data allows the study of different methods for prevention of traffic congestion in cities. A common need in all of these methods is the use of traffic predictions for supporting planning and operation of the traffic lights and traffic management schemes. This paper focuses on comparing the forecasting effectiveness of three machine learning models, namely Random Forests, Support Vector Regression, and Multilayer Perceptron—in addition to Multiple Linear Regression—using probe data collected from the road network of Thessaloniki, Greece. The comparison was conducted with multiple tests clustered in three types of scenarios. The first scenario tests the algorithms on specific randomly selected dates on different randomly selected roads. The second scenario tests the algorithms on randomly selected roads over eight consecutive 15 min intervals; the third scenario tests the algorithms on random roads for the duration of a whole day. The experimental results show that while the Support Vector Regression model performs best at stable conditions with minor variations, the Multilayer Perceptron model adapts better to circumstances with greater variations, in addition to having the most near-zero errors.


2014 ◽  
Vol 06 (04) ◽  
pp. 1450012 ◽  
Author(s):  
XIANFENG HU ◽  
YANG WANG ◽  
QIANG WU

Inspired by the authorship controversy of Dream of the Red Chamber and the application of machine learning in the study of literary stylometry, we develop a rigorous new method for the mathematical analysis of authorship by testing for a so-called chrono-divide in writing styles. Our method incorporates some of the latest advances in the study of authorship attribution, particularly techniques from support vector machines. By introducing the notion of relative frequency as a feature ranking metric, our method proves to be highly effective and robust. Applying our method to the Cheng–Gao version of Dream of the Red Chamber has led to convincing if not irrefutable evidence that the first 80 chapters and the last 40 chapters of the book were written by two different authors. Furthermore, our analysis has unexpectedly provided strong support to the hypothesis that Chapter 67 was not the work of Cao Xueqin either. We have also tested our method to the other three Great Classical Novels in Chinese. As expected no chrono-divides have been found. This provides further evidence of the robustness of our method.


Sequence Classification is one of the on-demand research projects in the field of Natural Language Processing (NLP). Classifying a set of images or text into an appropriate category or class is a complex task that a lot of Machine Learning (ML) models fail to accomplish accurately and end up under-fitting the given dataset. Some of the ML algorithms used in text classification are KNN, Naïve Bayes, Support Vector Machines, Convolutional Neural Networks (CNNs), Recursive CNNs, Recurrent Neural Networks (RNNs), Long Short Term Memory (LSTM), etc. For this experimental study, LSTM and a few other algorithms were chosen for a more comparative study. The dataset used is the SMS Spam Collection Dataset from Kaggle and 150 more entries were additionally added from different sources. Two possible class labels for the data points are spam and ham. Each entry consists of the class label, a few sentences of text followed by a few useless features that are eliminated. After converting the text to the required format, the models are run and then evaluated using various metrics. In experimental studies, the LSTM gives much better classification accuracy than the other machine learning models. F1-Scores in the high nineties were achieved using LSTM for classifying the text. The other models showed very low F1-Scores and Cosine Similarities indicating that they had underperformed on the dataset. Another interesting observation is that the LSTM had reduced the number of false positives and false negatives than any other model.


Sign in / Sign up

Export Citation Format

Share Document