Ensemble modeling approach for rainfall/groundwater balancing

D. Laucelli; O. Giustolisi; V. Babovic; M. Keijzer

doi:10.2166/hydro.2007.102

Ensemble modeling approach for rainfall/groundwater balancing

Journal of Hydroinformatics ◽

10.2166/hydro.2007.102 ◽

2007 ◽

Vol 9 (2) ◽

pp. 95-106 ◽

Cited By ~ 6

Author(s):

D. Laucelli ◽

O. Giustolisi ◽

V. Babovic ◽

M. Keijzer

Keyword(s):

Machine Learning ◽

Genetic Programming ◽

Empirical Evidence ◽

Averaging Method ◽

Total Error ◽

Ensemble Methods ◽

Real Data ◽

Symbolic Regression ◽

Ensemble Modeling ◽

Physical Phenomena

This paper introduces an application of machine learning, on real data. It deals with Ensemble Modeling, a simple averaging method for obtaining more reliable approximations using symbolic regression. Considerations on the contribution of bias and variance to the total error, and ensemble methods to reduce errors due to variance, have been tackled together with a specific application of ensemble modeling to hydrological forecasts. This work provides empirical evidence that genetic programming can greatly benefit from this approach in forecasting and simulating physical phenomena. Further considerations have been taken into account, such as the influence of Genetic Programming parameter settings on the model's performance.

Download Full-text

Control Synthesis as Machine Learning Control by Symbolic Regression Methods

Applied Sciences ◽

10.3390/app11125468 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5468

Author(s):

Elizaveta Shmalko ◽

Askhat Diveev

Keyword(s):

Machine Learning ◽

Genetic Programming ◽

Mathematical Formulation ◽

Small Variation ◽

Learning Control ◽

Symbolic Regression ◽

Control Technique ◽

Control Synthesis ◽

Basic Solution ◽

Regression Methods

The problem of control synthesis is considered as machine learning control. The paper proposes a mathematical formulation of machine learning control, discusses approaches of supervised and unsupervised learning by symbolic regression methods. The principle of small variation of the basic solution is presented to set up the neighbourhood of the search and to increase search efficiency of symbolic regression methods. Different symbolic regression methods such as genetic programming, network operator, Cartesian and binary genetic programming are presented in details. It is shown on the computational example the possibilities of symbolic regression methods as unsupervised machine learning control technique to the solution of MLC problem of control synthesis for obtaining the stabilization system for a mobile robot.

Download Full-text

A cluster-genetic programming approach for detecting pulmonary tuberculosis

Ethiopian Journal of Science and Technology ◽

10.4314/ejst.v14i1.5 ◽

2021 ◽

Vol 14 (1) ◽

pp. 71-88

Author(s):

Adane Nega Tarekegn ◽

Tamir Anteneh Alemu ◽

Alemu Kumlachew Tegegne

Keyword(s):

Machine Learning ◽

Cluster Analysis ◽

Genetic Programming ◽

A Priori ◽

Real Data ◽

Health Concern ◽

Programming Approach ◽

Data Set ◽

Middle Income ◽

Gp Model

Tuberculosis (TB) remains a global health concern. It commonly spreads through the air and attacks low immune bodies. TB is the most common and known health problem in low and middle-income countries. Genetic programming (GP) is a machine learning model for discovering useful relationships among the variables in complex clinical data. It is more appropriate in a circumstance when the form of the solution model is unknown a priori. The main objective of this study is to develop a model that can detect positive cases of TB suspected patients using genetic programming approach. In this paper, Genetic Programming (GP) is exploited to identify the presence of positive cases of tuberculosis from the real data set of TB suspects and hospitalized patients. First, the dataset is pre-processed, and target variables are identified using cluster analysis. This data-driven cluster analysis identifies two distinct clusters of patients, representing TB positive and TB negative. Then, GP is trained using the training datasets to construct a prediction model and tested with a separate new dataset. With the 30 runs, the median performance of GP on test data was good (sensitivity=0.78, specificity=0.95, accuracy=0.89, AUC=0.91). We find that GP shows better performance in predicting TB compared to other machine learning models. The study demonstrates that the GP model might be used to support clinicians to screen TB patients.

Download Full-text

Improving genetic programming based symbolic regression using deterministic machine learning

2013 IEEE Congress on Evolutionary Computation ◽

10.1109/cec.2013.6557774 ◽

2013 ◽

Cited By ~ 20

Author(s):

Ilknur Icke ◽

Joshua C. Bongard

Keyword(s):

Machine Learning ◽

Genetic Programming ◽

Symbolic Regression

Download Full-text

A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data

Soft Computing ◽

10.1007/s00500-021-05590-y ◽

2021 ◽

Author(s):

Baligh Al-Helali ◽

Qi Chen ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Genetic Programming ◽

Incomplete Data ◽

Symbolic Regression ◽

Imputation Method

Download Full-text

Multi-objective genetic programming for symbolic regression with the adaptive weighted splines representation

Proceedings of the Genetic and Evolutionary Computation Conference Companion ◽

10.1145/3449726.3459461 ◽

2021 ◽

Author(s):

Christian Raymond ◽

Qi Chen ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Genetic Programming ◽

Symbolic Regression ◽

Multi Objective

Download Full-text

Empirical analysis of variance for genetic programming based symbolic regression

Proceedings of the Genetic and Evolutionary Computation Conference Companion ◽

10.1145/3449726.3459486 ◽

2021 ◽

Author(s):

Lukas Kammerer ◽

Gabriel Kronberger ◽

Stephan Winkler

Keyword(s):

Genetic Programming ◽

Empirical Analysis ◽

Analysis Of Variance ◽

Symbolic Regression

Download Full-text

Machine Learning for the Dynamic Positioning of UAVs for Extended Connectivity

Sensors ◽

10.3390/s21134618 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4618

Author(s):

Francisco Oliveira ◽

Miguel Luís ◽

Susana Sargento

Keyword(s):

Machine Learning ◽

Cellular Networks ◽

Real Data ◽

Emerging Technology ◽

Machine Learning Algorithms ◽

Base Stations ◽

Aerial Vehicle ◽

Positioning Algorithm ◽

The Military ◽

Better Than

Unmanned Aerial Vehicle (UAV) networks are an emerging technology, useful not only for the military, but also for public and civil purposes. Their versatility provides advantages in situations where an existing network cannot support all requirements of its users, either because of an exceptionally big number of users, or because of the failure of one or more ground base stations. Networks of UAVs can reinforce these cellular networks where needed, redirecting the traffic to available ground stations. Using machine learning algorithms to predict overloaded traffic areas, we propose a UAV positioning algorithm responsible for determining suitable positions for the UAVs, with the objective of a more balanced redistribution of traffic, to avoid saturated base stations and decrease the number of users without a connection. The tests performed with real data of user connections through base stations show that, in less restrictive network conditions, the algorithm to dynamically place the UAVs performs significantly better than in more restrictive conditions, reducing significantly the number of users without a connection. We also conclude that the accuracy of the prediction is a very important factor, not only in the reduction of users without a connection, but also on the number of UAVs deployed.

Download Full-text

Application of a Rough Set-Based Inductive Learning System

Fundamenta Informaticae ◽

10.3233/fi-1993-182-409 ◽

1993 ◽

Vol 18 (2-4) ◽

pp. 209-220

Author(s):

Michael Hadjimichael ◽

Anita Wasilewska

Keyword(s):

Machine Learning ◽

Rough Set ◽

Presidential Election ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Inductive Learning ◽

Real Data ◽

Semantic Content ◽

Learning System ◽

Voter Preferences

We present here an application of Rough Set formalism to Machine Learning. The resulting Inductive Learning algorithm is described, and its application to a set of real data is examined. The data consists of a survey of voter preferences taken during the 1988 presidential election in the U.S.A. Results include an analysis of the predictive accuracy of the generated rules, and an analysis of the semantic content of the rules.

Download Full-text

Comparison of Ensemble Machine Learning Methods for Soil Erosion Pin Measurements

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10010042 ◽

2021 ◽

Vol 10 (1) ◽

pp. 42

Author(s):

Kieu Anh Nguyen ◽

Walter Chen ◽

Bor-Shiun Lin ◽

Uma Seeboonruang

Keyword(s):

Machine Learning ◽

Soil Erosion ◽

Ensemble Methods ◽

Machine Learning Algorithms ◽

Multivariate Adaptive Regression Splines ◽

Gradient Boosting ◽

Support Vector ◽

Ensemble Machine Learning ◽

Boosting Method ◽

Bagging Method

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.

Download Full-text

Kernel Based Data-Adaptive Support Vector Machines for Multi-Class Classification

Mathematics ◽

10.3390/math9090936 ◽

2021 ◽

Vol 9 (9) ◽

pp. 936

Author(s):

Jianli Shao ◽

Xin Liu ◽

Wenqing He

Keyword(s):

Machine Learning ◽

Spatial Association ◽

Class Imbalance ◽

Imbalanced Data ◽

Real Data ◽

Kernel Functions ◽

Support Vector ◽

Classification Problems ◽

Rare Class ◽

Data Adaptive

Imbalanced data exist in many classification problems. The classification of imbalanced data has remarkable challenges in machine learning. The support vector machine (SVM) and its variants are popularly used in machine learning among different classifiers thanks to their flexibility and interpretability. However, the performance of SVMs is impacted when the data are imbalanced, which is a typical data structure in the multi-category classification problem. In this paper, we employ the data-adaptive SVM with scaled kernel functions to classify instances for a multi-class population. We propose a multi-class data-dependent kernel function for the SVM by considering class imbalance and the spatial association among instances so that the classification accuracy is enhanced. Simulation studies demonstrate the superb performance of the proposed method, and a real multi-class prostate cancer image dataset is employed as an illustration. Not only does the proposed method outperform the competitor methods in terms of the commonly used accuracy measures such as the F-score and G-means, but also successfully detects more than 60% of instances from the rare class in the real data, while the competitors can only detect less than 20% of the rare class instances. The proposed method will benefit other scientific research fields, such as multiple region boundary detection.

Download Full-text