scholarly journals Evaluating Hydrological Model Performance using Information Theory-based Metrics

Author(s):  
Yakov A. Pachepsky ◽  
Gonzalo Martinez ◽  
Feng Pan ◽  
Thorsten Wagener ◽  
Thomas Nicholson

Abstract. The accuracy-based model performance metrics not necessarily reflect the qualitative correspondence between simulated and measured streamflow time series. The objective of this work was to use the information theory-based metrics to see whether they can be used as complementary tool for hydrologic model evaluation and selection. We simulated 10-year streamflow time series in five watersheds located in Texas, North Carolina, Mississippi, and West Virginia. Eight model of different complexity were applied. The information theory based metrics were obtained after representing the time series as strings of symbols where different symbols corresponded to different quantiles of the probability distribution of streamflow. The symbol alphabet was used. Three metrics were computed for those strings – mean information gain that measures the randomness of the signal, effective measure complexity that characterizes predictability and fluctuation complexity that characterizes the presence of a pattern in the signal. The observed streamflow time series has smaller information content and larger complexity metrics than the precipitation time series. Watersheds served as information filters and and streamflow time series were less random and more complex than the ones of precipitation. This is reflected by the fact that the watershed acts as the information filter in the hydrologic conversion process from precipitation to streamflow. The Nash Sutcliffe efficiency metric increased as the complexity of models increased, but in many cases several model had this efficiency values not statistically significant from each other. In such cases, ranking models by the closeness of the information theory based parameters in simulated and measured streamflow time series can provide an additional criterion for the evaluation of hydrologic model performance.

2021 ◽  
Author(s):  
Sophia Eugeni ◽  
Eric Vaags ◽  
Steven V. Weijs

<p>Accurate hydrologic modelling is critical to effective water resource management. As catchment attributes strongly influence the hydrologic behaviors in an area, they can be used to inform hydrologic models to better predict the discharge in a basin. Some basins may be more difficult to accurately predict than others. The difficulty in predicting discharge may also be related to the complexity of the discharge signal. The study establishes the relationship between a catchment’s static attributes and hydrologic model performance in those catchments, and also investigates the link to complexity, which we quantify with measures of compressibility based in information theory. </p><p>The project analyzes a large national dataset, comprised of catchment attributes for basins across the United States, paired with established performance metrics for corresponding hydrologic models. Principal Component Analysis (PCA) was completed on the catchment attributes data to determine the strongest modes in the input. The basins were clustered according to their catchment attributes and the performance within the clusters was compared. </p><p>Significant differences in model performance emerged between the clusters of basins. For the complexity analysis, details of the implementation and technical challenges will be discussed, as well as preliminary results.</p>


2013 ◽  
Vol 10 (2) ◽  
pp. 2029-2065 ◽  
Author(s):  
S. V. Weijs ◽  
N. van de Giesen ◽  
M. B. Parlange

Abstract. When inferring models from hydrological data or calibrating hydrological models, we might be interested in the information content of those data to quantify how much can potentially be learned from them. In this work we take a perspective from (algorithmic) information theory (AIT) to discuss some underlying issues regarding this question. In the information-theoretical framework, there is a strong link between information content and data compression. We exploit this by using data compression performance as a time series analysis tool and highlight the analogy to information content, prediction, and learning (understanding is compression). The analysis is performed on time series of a set of catchments, searching for the mechanisms behind compressibility. We discuss both the deeper foundation from algorithmic information theory, some practical results and the inherent difficulties in answering the question: "How much information is contained in this data?". The conclusion is that the answer to this question can only be given once the following counter-questions have been answered: (1) Information about which unknown quantities? (2) What is your current state of knowledge/beliefs about those quantities? Quantifying information content of hydrological data is closely linked to the question of separating aleatoric and epistemic uncertainty and quantifying maximum possible model performance, as addressed in current hydrological literature. The AIT perspective teaches us that it is impossible to answer this question objectively, without specifying prior beliefs. These beliefs are related to the maximum complexity one is willing to accept as a law and what is considered as random.


2017 ◽  
Author(s):  
Karthik Kumarasamy ◽  
Patrick Belmont

Abstract. Watershed scale models simulating hydrology and water quality have advanced rapidly in sophistication, process representation, flexibility in model structure, and input data. Given the importance of these models to support decision-making for a wide range of environmental issues, the hydrology community is compelled to improve the metrics used to evaluate model performance. More targeted and comprehensive metrics will facilitate better and more efficient calibration and will help demonstrate that the model is useful for the intended purpose. Here we introduce a suite of new tools for model evaluation, packaged as an open-source Hydrologic Model Evaluation (HydroME) Toolbox. Specifically, we demonstrate the use of box plots to illustrate the full distribution of common model performance metrics, such as R2, use of Euclidian distance, empirical Quantile-Quantile (Q-Q) plots and flow duration curves as simple metrics to identify and localize errors in model simulations. Further, we demonstrate the use of magnitude squared coherence to compare the frequency content between observed and modeled streamflow and wavelet coherence to localize frequency mismatches in time. We provide a rationale for a hierarchical selection of parameters to adjust during calibration and recommend that modelers progress from parameters with the most uncertainty to the least uncertainty, namely starting with pure calibration parameters, followed by derived parameters, and finally measured parameters. We apply these techniques in the calibration and evaluation of models of two watersheds, the Le Sueur River Basin (2880 km2) and Root River Basin (4300 km2) in southern Minnesota, USA.


2018 ◽  
Vol 11 (5) ◽  
pp. 1873-1886 ◽  
Author(s):  
Julian Koch ◽  
Mehmet Cüneyd Demirel ◽  
Simon Stisen

Abstract. The process of model evaluation is not only an integral part of model development and calibration but also of paramount importance when communicating modelling results to the scientific community and stakeholders. The modelling community has a large and well-tested toolbox of metrics to evaluate temporal model performance. In contrast, spatial performance evaluation does not correspond to the grand availability of spatial observations readily available and to the sophisticate model codes simulating the spatial variability of complex hydrological processes. This study makes a contribution towards advancing spatial-pattern-oriented model calibration by rigorously testing a multiple-component performance metric. The promoted SPAtial EFficiency (SPAEF) metric reflects three equally weighted components: correlation, coefficient of variation and histogram overlap. This multiple-component approach is found to be advantageous in order to achieve the complex task of comparing spatial patterns. SPAEF, its three components individually and two alternative spatial performance metrics, i.e. connectivity analysis and fractions skill score, are applied in a spatial-pattern-oriented model calibration of a catchment model in Denmark. Results suggest the importance of multiple-component metrics because stand-alone metrics tend to fail to provide holistic pattern information. The three SPAEF components are found to be independent, which allows them to complement each other in a meaningful way. In order to optimally exploit spatial observations made available by remote sensing platforms, this study suggests applying bias insensitive metrics which further allow for a comparison of variables which are related but may differ in unit. This study applies SPAEF in the hydrological context using the mesoscale Hydrologic Model (mHM; version 5.8), but we see great potential across disciplines related to spatially distributed earth system modelling.


Author(s):  
Adam Schreiner-McGraw ◽  
Hoori Ajami ◽  
Ray Anderson ◽  
Dong Wang

Accurate simulation of plant water use across agricultural ecosystems is essential for various applications, including precision agriculture, quantifying groundwater recharge, and optimizing irrigation rates. Previous approaches to integrating plant water use data into hydrologic models have relied on evapotranspiration (ET) observations. Recently, the flux variance similarity approach has been developed to partition ET to transpiration (T) and evaporation, providing an opportunity to use T data to parameterize models. To explore the value of T/ET data in improving hydrologic model performance, we examined multiple approaches to incorporate these observations for vegetation parameterization. We used ET observations from 5 eddy covariance towers located in the San Joaquin Valley, California, to parameterize orchard crops in an integrated land surface – groundwater model. We find that a simple approach of selecting the best parameter sets based on ET and T performance metrics works best at these study sites. Selecting parameters based on performance relative to observed ET creates an uncertainty of 27% relative to the observed value. When parameters are selected using both T and ET data, this uncertainty drops to 24%. Similarly, the uncertainty in potential groundwater recharge drops from 63% to 58% when parameters are selected with ET or T and ET data, respectively. Additionally, using crop type parameters results in similar levels of simulated ET as using site-specific parameters. Different irrigation schemes create high amounts of uncertainty and highlight the need for accurate estimates of irrigation when performing water budget studies.


Author(s):  
Charles A. Doan ◽  
Ronaldo Vigo

Abstract. Several empirical investigations have explored whether observers prefer to sort sets of multidimensional stimuli into groups by employing one-dimensional or family-resemblance strategies. Although one-dimensional sorting strategies have been the prevalent finding for these unsupervised classification paradigms, several researchers have provided evidence that the choice of strategy may depend on the particular demands of the task. To account for this disparity, we propose that observers extract relational patterns from stimulus sets that facilitate the development of optimal classification strategies for relegating category membership. We conducted a novel constrained categorization experiment to empirically test this hypothesis by instructing participants to either add or remove objects from presented categorical stimuli. We employed generalized representational information theory (GRIT; Vigo, 2011b , 2013a , 2014 ) and its associated formal models to predict and explain how human beings chose to modify these categorical stimuli. Additionally, we compared model performance to predictions made by a leading prototypicality measure in the literature.


1991 ◽  
Vol 56 (3) ◽  
pp. 505-559 ◽  
Author(s):  
Karel Eckschlager

In this review, analysis is treated as a process of gaining information on chemical composition, taking place in a stochastic system. A model of this system is outlined, and a survey of measures and methods of information theory is presented to an extent as useful for qualitative or identification, quantitative and trace analysis and multicomponent analysis. It is differentiated between information content of an analytical signal and information gain, or amount of information, obtained by the analysis, and their interrelation is demonstrated. Some notions of analytical chemistry are quantified from the information theory and system theory point of view; it is also demonstrated that the use of fuzzy set theory can be suitable. The review sums up the principal results of the series of 25 papers which have been published in this journal since 1971.


2019 ◽  
Vol 23 (10) ◽  
pp. 4323-4331 ◽  
Author(s):  
Wouter J. M. Knoben ◽  
Jim E. Freer ◽  
Ross A. Woods

Abstract. A traditional metric used in hydrology to summarize model performance is the Nash–Sutcliffe efficiency (NSE). Increasingly an alternative metric, the Kling–Gupta efficiency (KGE), is used instead. When NSE is used, NSE = 0 corresponds to using the mean flow as a benchmark predictor. The same reasoning is applied in various studies that use KGE as a metric: negative KGE values are viewed as bad model performance, and only positive values are seen as good model performance. Here we show that using the mean flow as a predictor does not result in KGE = 0, but instead KGE =1-√2≈-0.41. Thus, KGE values greater than −0.41 indicate that a model improves upon the mean flow benchmark – even if the model's KGE value is negative. NSE and KGE values cannot be directly compared, because their relationship is non-unique and depends in part on the coefficient of variation of the observed time series. Therefore, modellers who use the KGE metric should not let their understanding of NSE values guide them in interpreting KGE values and instead develop new understanding based on the constitutive parts of the KGE metric and the explicit use of benchmark values to compare KGE scores against. More generally, a strong case can be made for moving away from ad hoc use of aggregated efficiency metrics and towards a framework based on purpose-dependent evaluation metrics and benchmarks that allows for more robust model adequacy assessment.


2021 ◽  
Vol 11 (14) ◽  
pp. 6594
Author(s):  
Yu-Chia Hsu

The interdisciplinary nature of sports and the presence of various systemic and non-systemic factors introduce challenges in predicting sports match outcomes using a single disciplinary approach. In contrast to previous studies that use sports performance metrics and statistical models, this study is the first to apply a deep learning approach in financial time series modeling to predict sports match outcomes. The proposed approach has two main components: a convolutional neural network (CNN) classifier for implicit pattern recognition and a logistic regression model for match outcome judgment. First, the raw data used in the prediction are derived from the betting market odds and actual scores of each game, which are transformed into sports candlesticks. Second, CNN is used to classify the candlesticks time series on a graphical basis. To this end, the original 1D time series are encoded into 2D matrix images using Gramian angular field and are then fed into the CNN classifier. In this way, the winning probability of each matchup team can be derived based on historically implied behavioral patterns. Third, to further consider the differences between strong and weak teams, the CNN classifier adjusts the probability of winning the match by using the logistic regression model and then makes a final judgment regarding the match outcome. We empirically test this approach using 18,944 National Football League game data spanning 32 years and find that using the individual historical data of each team in the CNN classifier for pattern recognition is better than using the data of all teams. The CNN in conjunction with the logistic regression judgment model outperforms the CNN in conjunction with SVM, Naïve Bayes, Adaboost, J48, and random forest, and its accuracy surpasses that of betting market prediction.


Sign in / Sign up

Export Citation Format

Share Document