Comparison of three data-driven techniques in modelling the evapotranspiration process

I. El-Baroudy; A. Elshorbagy; S. K. Carey; O. Giustolisi; D. Savic

doi:10.2166/hydro.2010.029

Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology – Part 1: Concepts and methodology

Hydrology and Earth System Sciences Discussions ◽

10.5194/hessd-6-7055-2009 ◽

2009 ◽

Vol 6 (6) ◽

pp. 7055-7093 ◽

Cited By ~ 4

Author(s):

A. Elshorbagy ◽

G. Corzo ◽

S. Srinivasulu ◽

D. P. Solomatine

Keyword(s):

Polynomial Regression ◽

Predictive Accuracy ◽

Lower Layer ◽

Data Driven ◽

Support Vector ◽

K Nearest Neighbors ◽

Evolutionary Polynomial Regression ◽

Modeling Techniques ◽

Modeling Experiment ◽

Data Driven Modeling

Abstract. A comprehensive data driven modeling experiment is presented in two-part paper. In this first part, an extensive data-driven modeling experiment is proposed. The most important concerns regarding the way data driven modeling (DDM) techniques and data were handled, compared, and evaluated, and the basis on which findings and conclusions were drawn are discussed. A concise review of key articles that presented comparisons among various DDM techniques is presented. Six DDM techniques, namely, neural networks, genetic programming, evolutionary polynomial regression, support vector machines, M5 model trees, and K-nearest neighbors are proposed and explained. Multiple linear regression and naïve models are also suggested as baseline for comparison with the various techniques. Five datasets from Canada and Europe representing evapotranspiration, upper and lower layer soil moisture content, and rainfall-runoff process are described and proposed for the modeling experiment. Twelve different realizations (groups) from each dataset are created by a procedure involving random sampling. Each group contains three subsets; training, cross-validation, and testing. Each modeling technique is proposed to be applied to each of the 12 groups of each dataset. This way, both predictive accuracy and uncertainty of the modeling techniques can be evaluated. The implementation of the modeling techniques, results and analysis, and the findings of the modeling experiment are deferred to the second part of this paper.

Download Full-text

Crisp discharge forecasts and grey uncertainty bands using data-driven models

Hydrology Research ◽

10.2166/nh.2012.121 ◽

2012 ◽

Vol 43 (5) ◽

pp. 589-602 ◽

Cited By ~ 9

Author(s):

S. Alvisi ◽

E. Creaco ◽

M. Franchini

Keyword(s):

Polynomial Regression ◽

Standard Technique ◽

Least Square ◽

Data Driven ◽

Lead Times ◽

Ann Model ◽

Total Uncertainty ◽

Evolutionary Polynomial Regression ◽

Similar Accuracy ◽

Comparison Of The Results

A data-driven artificial neural network (ANN) model and a data-driven evolutionary polynomial regression (EPR) model are here used to set up two real-time crisp discharge forecasting models whose crisp parameters are estimated through the least-square criterion. In order to represent the total uncertainty of each model in performing the forecast, their parameters are then considered as grey numbers. Comparison of the results obtained through the application of the two models to a real case study shows that the crisp models based on ANN and EPR provide similar accuracy for short forecasting lead times; for long forecasting lead times, the performance of the EPR model deteriorates with respect to that of the ANN model. As regards the uncertainty bands produced by the grey formulation of the two data-driven models, it is shown that, in the ANN case, these bands are on average narrower than those obtained by using a standard technique such as the Box–Cox transformation of the errors; in the EPR case, these bands are on average larger. These results therefore suggest that the performance of a grey data-driven model depends on its inner structure and that, for the specific models here considered, the ANN is to be preferred.

Download Full-text

Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 2: Application

Hydrology and Earth System Sciences ◽

10.5194/hess-14-1943-2010 ◽

2010 ◽

Vol 14 (10) ◽

pp. 1943-1961 ◽

Cited By ~ 107

Author(s):

A. Elshorbagy ◽

G. Corzo ◽

S. Srinivasulu ◽

D. P. Solomatine

Keyword(s):

Soil Moisture ◽

Case Studies ◽

Data Driven ◽

Modeling Technique ◽

Actual Evapotranspiration ◽

Support Vector ◽

Rainfall Runoff ◽

Highly Nonlinear ◽

Modeling Techniques ◽

Data Driven Modeling

Abstract. In this second part of the two-part paper, the data driven modeling (DDM) experiment, presented and explained in the first part, is implemented. Inputs for the five case studies (half-hourly actual evapotranspiration, daily peat soil moisture, daily till soil moisture, and two daily rainfall-runoff datasets) are identified, either based on previous studies or using the mutual information content. Twelve groups (realizations) were randomly generated from each dataset by randomly sampling without replacement from the original dataset. Neural networks (ANNs), genetic programming (GP), evolutionary polynomial regression (EPR), Support vector machines (SVM), M5 model trees (M5), K-nearest neighbors (K-nn), and multiple linear regression (MLR) techniques are implemented and applied to each of the 12 realizations of each case study. The predictive accuracy and uncertainties of the various techniques are assessed using multiple average overall error measures, scatter plots, frequency distribution of model residuals, and the deterioration rate of prediction performance during the testing phase. Gamma test is used as a guide to assist in selecting the appropriate modeling technique. Unlike two nonlinear soil moisture case studies, the results of the experiment conducted in this research study show that ANNs were a sub-optimal choice for the actual evapotranspiration and the two rainfall-runoff case studies. GP is the most successful technique due to its ability to adapt the model complexity to the modeled data. EPR performance could be close to GP with datasets that are more linear than nonlinear. SVM is sensitive to the kernel choice and if appropriately selected, the performance of SVM can improve. M5 performs very well with linear and semi linear data, which cover wide range of hydrological situations. In highly nonlinear case studies, ANNs, K-nn, and GP could be more successful than other modeling techniques. K-nn is also successful in linear situations, and it should not be ignored as a potential modeling technique for hydrological applications.

Download Full-text

Selection of relevant input variables in storm water quality modeling by multiobjective evolutionary polynomial regression paradigm

Water Resources Research ◽

10.1002/2015wr017971 ◽

2016 ◽

Vol 52 (4) ◽

pp. 2403-2419 ◽

Cited By ~ 12

Author(s):

E. Creaco ◽

L. Berardi ◽

Siao Sun ◽

O. Giustolisi ◽

D. Savic

Keyword(s):

Water Quality ◽

Polynomial Regression ◽

Water Quality Modeling ◽

Storm Water ◽

Evolutionary Polynomial Regression ◽

Quality Modeling ◽

Input Variables ◽

Selection Of

Download Full-text

A symbolic data-driven technique based on evolutionary polynomial regression

Journal of Hydroinformatics ◽

10.2166/hydro.2006.020b ◽

2006 ◽

Vol 8 (3) ◽

pp. 207-222 ◽

Cited By ~ 174

Author(s):

Orazio Giustolisi ◽

Dragan A. Savic

Keyword(s):

Polynomial Regression ◽

Computing Methodology ◽

Resistance Coefficient ◽

Regression Method ◽

Data Driven ◽

Evolutionary Polynomial Regression ◽

Symbolic Data ◽

Computational Performance ◽

Regression Techniques ◽

Physical Insight

This paper describes a new hybrid regression method that combines the best features of conventional numerical regression techniques with the genetic programming symbolic regression technique. The key idea is to employ an evolutionary computing methodology to search for a model of the system/process being modelled and to employ parameter estimation to obtain constants using least squares. The new technique, termed Evolutionary Polynomial Regression (EPR) overcomes shortcomings in the GP process, such as computational performance; number of evolutionary parameters to tune and complexity of the symbolic models. Similarly, it alleviates issues arising from numerical regression, including difficulties in using physical insight and over-fitting problems. This paper demonstrates that EPR is good, both in interpolating data and in scientific knowledge discovery. As an illustration, EPR is used to identify polynomial formulæ with progressively increasing levels of noise, to interpolate the Colebrook-White formula for a pipe resistance coefficient and to discover a formula for a resistance coefficient from experimental data.

Download Full-text

Multi-objective evolutionary polynomial regression-based prediction of energy consumption probing

Water Science & Technology ◽

10.2166/wst.2017.158 ◽

2017 ◽

Vol 75 (12) ◽

pp. 2791-2799 ◽

Cited By ~ 6

Author(s):

Hossein Bonakdari ◽

Isa Ebtehaj ◽

Azam Akhbari

Keyword(s):

Energy Consumption ◽

Polynomial Regression ◽

Treatment Time ◽

Synthetic Wastewater ◽

Process Conditions ◽

Multi Objective ◽

Evolutionary Polynomial Regression ◽

Sensitivity Analysis Method ◽

Initial Ph ◽

Input Variables

Electrocoagulation (EC) is employed to investigate the energy consumption (EnC) of synthetic wastewater. In order to find the best process conditions, the influence of various parameters including initial pH, initial dye concentration, applied voltage, initial electrolyte concentration, and treatment time are investigated in this study. EnC is considered the main criterion of process evaluation in investigating the effect of the independent variables on the EC process and determining the optimum condition. Evolutionary polynomial regression is combined with a multi-objective genetic algorithm (EPR-MOGA) to present a new, simple and accurate equation for estimating EnC to overcome existing method weaknesses. To survey the influence of the effective variables, six different input combinations are considered. According to the results, EPR-MOGA Model 1 is the most accurate compared to other models, as it has the lowest error indices in predicting EnC (MARE = 0.35, RMSE = 2.33, SI = 0.23 and R2 = 0.98). A comparison of EPR-MOGA with reduced quadratic multiple regression methods in terms of feasibility confirms that EPR-MOGA is an effective alternative method. Moreover, the partial derivative sensitivity analysis method is employed to analyze the EnC variation trend according to input variables.

Download Full-text

Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 1: Concepts and methodology

Hydrology and Earth System Sciences ◽

10.5194/hess-14-1931-2010 ◽

2010 ◽

Vol 14 (10) ◽

pp. 1931-1941 ◽

Cited By ~ 120

Author(s):

A. Elshorbagy ◽

G. Corzo ◽

S. Srinivasulu ◽

D. P. Solomatine

Keyword(s):

Polynomial Regression ◽

Lower Layer ◽

Data Driven ◽

Support Vector ◽

K Nearest Neighbors ◽

Evolutionary Polynomial Regression ◽

Vector Machines ◽

Modeling Techniques ◽

Modeling Experiment ◽

Data Driven Modeling

Abstract. A comprehensive data driven modeling experiment is presented in a two-part paper. In this first part, an extensive data-driven modeling experiment is proposed. The most important concerns regarding the way data driven modeling (DDM) techniques and data were handled, compared, and evaluated, and the basis on which findings and conclusions were drawn are discussed. A concise review of key articles that presented comparisons among various DDM techniques is presented. Six DDM techniques, namely, neural networks, genetic programming, evolutionary polynomial regression, support vector machines, M5 model trees, and K-nearest neighbors are proposed and explained. Multiple linear regression and naïve models are also suggested as baseline for comparison with the various techniques. Five datasets from Canada and Europe representing evapotranspiration, upper and lower layer soil moisture content, and rainfall-runoff process are described and proposed, in the second paper, for the modeling experiment. Twelve different realizations (groups) from each dataset are created by a procedure involving random sampling. Each group contains three subsets; training, cross-validation, and testing. Each modeling technique is proposed to be applied to each of the 12 groups of each dataset. This way, both prediction accuracy and uncertainty of the modeling techniques can be evaluated. The description of the datasets, the implementation of the modeling techniques, results and analysis, and the findings of the modeling experiment are deferred to the second part of this paper.

Download Full-text

Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology – Part 2: Application

Hydrology and Earth System Sciences Discussions ◽

10.5194/hessd-6-7095-2009 ◽

2009 ◽

Vol 6 (6) ◽

pp. 7095-7142 ◽

Cited By ~ 6

Author(s):

A. Elshorbagy ◽

G. Corzo ◽

S. Srinivasulu ◽

D. P. Solomatine

Keyword(s):

Soil Moisture ◽

Case Studies ◽

Data Driven ◽

Modeling Technique ◽

Actual Evapotranspiration ◽

Support Vector ◽

Rainfall Runoff ◽

Highly Nonlinear ◽

Modeling Techniques ◽

Data Driven Modeling

Abstract. In this second part of the two-part paper, the data driven modeling (DDM) experiment, presented and explained in the first part, is implemented. Inputs for the five case studies (half-hourly actual evapotranspiration, daily peat soil moisture, daily till soil moisture, and two daily rainfall-runoff datasets) are identified, either based on previous studies or using the mutual information content. Twelve groups (realizations) were randomly generated from each dataset by randomly sampling without replacement from the original dataset. Neural networks (ANNs), genetic programming (GP), evolutionary polynomial regression (EPR), Support vector machines (SVM), M5 model trees (M5), K nearest neighbors (K-nn), and multiple linear regression (MLR) techniques are implemented and applied to each of the 12 realizations of each case study. The predictive accuracy and uncertainties of the various techniques are assessed using multiple average overall error measures, scatter plots, frequency distribution of model residuals, and the deterioration rate of prediction performance during the testing phase. Gamma test is used as a guide to assist in selecting the appropriate modeling technique. Unlike the two nonlinear soil moisture case studies, the results of the experiment conducted in this research study show that ANNs were a sub-optimal choice for the actual evapotranspiration and the two rainfall-runoff case studies. GP is the most successful technique due to its ability to adapt the model complexity to the modeled data. EPR performance could be close to GP with datasets that are more linear than nonlinear. SVM is sensitive to the kernel choice and if appropriately selected, the performance of SVM can improve. M5 performs very well with linear and semi linear data, which cover wide range of hydrological situations. In highly nonlinear case studies, ANNs, K-nn, and GP could be more successful than other modeling techniques. K-nn is also successful in linear situations, and it should not be ignored as a potential modeling technique for hydrological applications.

Download Full-text

Comparison of data-driven methods for downscaling ensemble weather forecasts

Hydrology and Earth System Sciences Discussions ◽

10.5194/hessd-4-189-2007 ◽

2007 ◽

Vol 4 (1) ◽

pp. 189-210 ◽

Cited By ~ 3

Author(s):

X. Liu ◽

P. Coulibaly ◽

N. Evora

Keyword(s):

Polynomial Regression ◽

Daily Precipitation ◽

Temperature Series ◽

Data Driven ◽

Daily Maximum ◽

Ensemble Forecasts ◽

Weather Forecasts ◽

Evolutionary Polynomial Regression ◽

Medium Range Forecast ◽

Precipitation And Temperature

Abstract. This study investigates dynamically different data-driven methods, specifically a statistical downscaling model (SDSM), a time lagged feedforward neural network (TLFN), and an evolutionary polynomial regression (EPR) technique for downscaling numerical weather ensemble forecasts generated by a medium range forecast (MRF) model. Given the coarse resolution (about 200-km grid spacing) of the MRF model, an optimal use of the weather forecasts at the local or watershed scale, requires appropriate downscaling techniques. The selected methods are applied for downscaling ensemble daily precipitation and temperature series for the Chute-du-Diable basin located in northeastern Canada. The downscaling results show that the TLFN and EPR have similar performance in downscaling ensemble daily precipitation as well as daily maximum and minimum temperature series whatever the season. Both the TLFN and EPR are more efficient downscaling techniques than SDSM for both the ensemble daily precipitation and temperature.

Download Full-text

Comparison of data-driven methods for downscaling ensemble weather forecasts

Hydrology and Earth System Sciences ◽

10.5194/hess-12-615-2008 ◽

2008 ◽

Vol 12 (2) ◽

pp. 615-624 ◽

Cited By ~ 14

Author(s):

◽

P. Coulibaly ◽

N. Evora

Keyword(s):

Polynomial Regression ◽

Daily Precipitation ◽

Temperature Series ◽

Data Driven ◽

Daily Maximum ◽

Ensemble Forecasts ◽

Weather Forecasts ◽

Evolutionary Polynomial Regression ◽

Medium Range Forecast ◽

Precipitation And Temperature

Abstract. This study investigates dynamically different data-driven methods, specifically a statistical downscaling model (SDSM), a time lagged feedforward neural network (TLFN), and an evolutionary polynomial regression (EPR) technique for downscaling numerical weather ensemble forecasts generated by a medium range forecast (MRF) model. Given the coarse resolution (about 200-km grid spacing) of the MRF model, an optimal use of the weather forecasts at the local or watershed scale, requires appropriate downscaling techniques. The selected methods are applied for downscaling ensemble daily precipitation and temperature series for the Chute-du-Diable basin located in northeastern Canada. The downscaling results show that the TLFN and EPR have similar performance in downscaling ensemble daily precipitation as well as daily maximum and minimum temperature series whatever the season. Both the TLFN and EPR are more efficient downscaling techniques than SDSM for both the ensemble daily precipitation and temperature.

Download Full-text