Evaluation of Regression Models of Balance Calibration Data Using an Empirical Criterion

Storm water quality models are useful tools in storm water management. Interest has been growing in analyzing existing data for developing models for urban storm water quality evaluations. It is important to select appropriate model inputs when many candidate explanatory variables are available. Model calibration and verification are essential steps in any storm water quality modeling. This study investigates input variable selection and calibration data selection in storm water quality regression models. The two selection problems are mutually interacted. A procedure is developed in order to fulfil the two selection tasks in order. The procedure firstly selects model input variables using a cross validation method. An appropriate number of variables are identified as model inputs to ensure that a model is neither overfitted nor underfitted. Based on the model input selection results, calibration data selection is studied. Uncertainty of model performances due to calibration data selection is investigated with a random selection method. An approach using the cluster method is applied in order to enhance model calibration practice based on the principle of selecting representative data for calibration. The comparison between results from the cluster selection method and random selection shows that the former can significantly improve performances of calibrated models. It is found that the information content in calibration data is important in addition to the size of calibration data.

Download Full-text

On calibration data selection: The case of stormwater quality regression models

Environmental Modelling & Software ◽

10.1016/j.envsoft.2012.02.007 ◽

2012 ◽

Vol 35 ◽

pp. 61-73 ◽

Cited By ~ 10

Author(s):

Siao Sun ◽

Jean-Luc Bertrand-Krajewski

Keyword(s):

Regression Models ◽

Data Selection ◽

Calibration Data ◽

Stormwater Quality

Download Full-text

Calibration and validation of multiple regression models for stormwater quality prediction: data partitioning, effect of dataset size and characteristics

Water Science & Technology ◽

10.2166/wst.2005.0060 ◽

2005 ◽

Vol 52 (3) ◽

pp. 45-52 ◽

Cited By ~ 31

Author(s):

M. Mourad ◽

J.-L. Bertrand-Krajewski ◽

G. Chebbo

Keyword(s):

Multiple Regression ◽

Regression Models ◽

Data Partitioning ◽

Data Sets ◽

Calibration Data ◽

Stormwater Quality ◽

Calibration And Validation ◽

Multiple Regression Models ◽

Dataset Size ◽

Few Data

Two main issues regarding stormwater quality models have been investigated: i) the effect of calibration dataset size and characteristics on calibration and validation results; ii) the optimal split of available data into calibration and validation subsets. Data from 13 catchments have been used for three pollutants: BOD, COD and SS. Three multiple regression models were calibrated and validated. The use of different data sets and different models allows viewing general trends. It was found mainly that multiple regression models are case sensitive to calibration data. Few data used for calibration infers bad predictions despite good calibration results. It was also found that the random split of available data into halves for calibration and validation is not optimal. More data should be allocated to calibration. The proportion of data to be used for validation increases with the number of available data (N) and reaches about 35% for N around 55 measured events.

Download Full-text