scholarly journals Assessing the performance of a suite of machine learning models for daily river water temperature prediction

PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7065 ◽  
Author(s):  
Senlin Zhu ◽  
Emmanuel Karlo Nyarko ◽  
Marijana Hadzima-Nyarko ◽  
Salim Heddam ◽  
Shiqiang Wu

In this study, different versions of feedforward neural network (FFNN), Gaussian process regression (GPR), and decision tree (DT) models were developed to estimate daily river water temperature using air temperature (Ta), flow discharge (Q), and the day of year (DOY) as predictors. The proposed models were assessed using observed data from eight river stations, and modelling results were compared with the air2stream model. Model performances were evaluated using four indicators in this study: the coefficient of correlation (R), the Willmott index of agreement (d), the root mean squared error (RMSE), and the mean absolute error (MAE). Results indicated that the three machine learning models had similar performance when only Ta was used as the predictor. When the day of year was included as model input, the performances of the three machine learning models dramatically improved. Including flow discharge instead of day of year, as an additional predictor, provided a lower gain in model accuracy, thereby showing the relatively minor role of flow discharge in river water temperature prediction. However, an increase in the relative importance of flow discharge was noticed for stations with high altitude catchments (Rhône, Dischmabach and Cedar) which are influenced by cold water releases from hydropower or snow melting, suggesting the dependence of the role of flow discharge on the hydrological characteristics of such rivers. The air2stream model outperformed the three machine learning models for most of the studied rivers except for the cases where including flow discharge as a predictor provided the highest benefits. The DT model outperformed the FFNN and GPR models in the calibration phase, however in the validation phase, its performance slightly decreased. In general, the FFNN model performed slightly better than GPR model. In summary, the overall modelling results showed that the three machine learning models performed well for river water temperature modelling.

2021 ◽  
Author(s):  
Moritz Feigl ◽  
Katharina Lebiedzinski ◽  
Mathew Herrnegger ◽  
Karsten Schulz

Abstract. Water temperature in rivers is a crucial environmental factor with the ability to alter hydro-ecological as well as socio-economic conditions within a catchment. The development of modelling concepts for predicting river water temperature is and will be essential for an effective integrated water management and the development of adaptation strategies to future global changes (e.g. climate change). This study tests the performance of 6 different machine learning models: step-wise linear regression, Random forest, eXtreme Gradient Boosting (XGBoost), Feedforward neural networks (FNN), and two types of Recurrent neural networks (RNN). All models are applied using different data inputs for daily water temperature prediction in 10 Austrian catchments ranging from 200 km2 to 96000 km2 and exhibiting a wide range of physiographic characteristics. The evaluated input data sets include combinations of daily means of air temperature, runoff, precipitation and global radiation. Bayesian optimization is applied to optimize the hyperparameters of all applied machine learning models. To make the results comparable to previous studies, two widely used benchmark models are applied additionally: linear regression and air2stream. With a mean root mean squared error (RMSE) of 0.55 °C the tested models could significantly improve water temperature prediction compared to linear regression (1.55 °C) and air2stream (0.98 °C). In general, the results show a very similar performance of the tested machine learning models, with a median RMSE difference of 0.08 °C between the models. From the 6 tested machine learning models both FNNs and XGBoost performed best in 4 of the 10 catchments. RNNs are the best performing models in the largest catchment, indicating that RNNs are mainly performing well when processes with long-term dependencies are important. Furthermore, a wide range of performance was observed for different hyperparameter sets for the tested models, showing the importance of hyperprameter optimization. Especially the FNN model results showed an extremely large RMSE standard deviation of 1.60 °C due to the chosen hyperparamerters. This study evaluates different sets of input variables, machine learning models and training characteristics for daily stream water temperature prediction, acting as a basis for future development of regional multi-catchment water temperature prediction models. All preprocessing steps and models are implemented into the open source R package wateRtemp, to provide easy access to these modelling approaches and facilitate further research.


2021 ◽  
Vol 25 (5) ◽  
pp. 2951-2977
Author(s):  
Moritz Feigl ◽  
Katharina Lebiedzinski ◽  
Mathew Herrnegger ◽  
Karsten Schulz

Abstract. Water temperature in rivers is a crucial environmental factor with the ability to alter hydro-ecological as well as socio-economic conditions within a catchment. The development of modelling concepts for predicting river water temperature is and will be essential for effective integrated water management and the development of adaptation strategies to future global changes (e.g. climate change). This study tests the performance of six different machine-learning models: step-wise linear regression, random forest, eXtreme Gradient Boosting (XGBoost), feed-forward neural networks (FNNs), and two types of recurrent neural networks (RNNs). All models are applied using different data inputs for daily water temperature prediction in 10 Austrian catchments ranging from 200 to 96 000 km2 and exhibiting a wide range of physiographic characteristics. The evaluated input data sets include combinations of daily means of air temperature, runoff, precipitation and global radiation. Bayesian optimization is applied to optimize the hyperparameters of all applied machine-learning models. To make the results comparable to previous studies, two widely used benchmark models are applied additionally: linear regression and air2stream. With a mean root mean squared error (RMSE) of 0.55 ∘C, the tested models could significantly improve water temperature prediction compared to linear regression (1.55 ∘C) and air2stream (0.98 ∘C). In general, the results show a very similar performance of the tested machine-learning models, with a median RMSE difference of 0.08 ∘C between the models. From the six tested machine-learning models both FNNs and XGBoost performed best in 4 of the 10 catchments. RNNs are the best-performing models in the largest catchment, indicating that RNNs mainly perform well when processes with long-term dependencies are important. Furthermore, a wide range of performance was observed for different hyperparameter sets for the tested models, showing the importance of hyperparameter optimization. Especially the FNN model results showed an extremely large RMSE standard deviation of 1.60 ∘C due to the chosen hyperparameters. This study evaluates different sets of input variables, machine-learning models and training characteristics for daily stream water temperature prediction, acting as a basis for future development of regional multi-catchment water temperature prediction models. All preprocessing steps and models are implemented in the open-source R package wateRtemp to provide easy access to these modelling approaches and facilitate further research.


Author(s):  
M. Rajesh ◽  
S. Rehana

Abstract Machine learning (ML) has been increasingly adopted due to its ability to model complex and non-linearities between river water temperature (RWT) and its predictors (e.g., Air Temperature, AT). Most of these ML approaches have been applied using average AT without any detailed sensitivity analysis of other forms of AT (e.g., maximum and minimum). The present study demonstrates how new ML approaches, such as ridge regression (RR), K-nearest neighbors (KNN) regressor, random forest (RF) regressor, and support vector regression (SVR), can be coupled with Sobol’ global sensitivity analysis (GSA) to predict accurate RWT estimates with the most appropriate form of AT. Furthermore, the proposed ML approaches have been combined with the Ensemble Kalman Filter (EnKF), a data assimilation (DA) technique to improve the predicted values based on the measured data. The proposed modelling framework's effectiveness is demonstrated with a tropical river system of India, Tunga-Bhadra River, as a case study. The SVR has been noted as the most robust ML model to predict RWT at a monthly time scale compared with daily and seasonal. The study demonstrates how ML methods can be coupled with a global sensitivity algorithm and DA techniques to generate accurate RWT predictions in river water quality modelling.


Molecules ◽  
2021 ◽  
Vol 26 (20) ◽  
pp. 6279
Author(s):  
Alessio Ragno ◽  
Anna Baldisserotto ◽  
Lorenzo Antonini ◽  
Manuela Sabatino ◽  
Filippo Sapienza ◽  
...  

Scientific investigation on essential oils composition and the related biological profile are continuously growing. Nevertheless, only a few studies have been performed on the relationships between chemical composition and biological data. Herein, the investigation of 61 assayed essential oils is reported focusing on their inhibition activity against Microsporum spp including development of machine learning models with the aim of highlining the possible chemical components mainly related to the inhibitory potency. The application of machine learning and deep learning techniques for predictive and descriptive purposes have been applied successfully to many fields. Quantitative composition–activity relationships machine learning-based models were developed for the 61 essential oils tested as Microsporum spp growth modulators. The models were built with in-house python scripts implementing data augmentation with the purpose of having a smoother flow between essential oils’ chemical compositions and biological data. High statistical coefficient values (Accuracy, Matthews correlation coefficient and F1 score) were obtained and model inspection permitted to detect possible specific roles related to some components of essential oils’ constituents. Robust machine learning models are far more useful tools to reveal data augmentation in comparison with raw data derived models. To the best of the authors knowledge this is the first report using data augmentation to highlight the role of complex mixture components, in particular a first application of these data will be for the development of ingredients in the dermo-cosmetic field investigating microbial species considering the urge for the use of natural preserving and acting antimicrobial agents.


2021 ◽  
Author(s):  
Saul Justin Newman ◽  
Robert T Furbank

AbstractFour species of grass generate half of all human-consumed calories1. However, abundant biological data on species that produce our food remains largely inaccessible, imposing direct barriers to understanding crop yield and fitness traits. Here, we assemble and analyse a continent-wide database of field experiments spanning ten years and hundreds of thousands of machine-phenotyped populations of ten major crop species. Training an ensemble of machine learning models, using thousands of variables capturing weather, ground-sensor, soil, chemical and fertiliser dosage, management, and satellite data, produces robust cross-continent yield models exceeding R2 = 0.8 prediction accuracy. In contrast to ‘black box’ analytics, detailed interrogation of these models reveals fundamental drivers of crop behaviour and complex interactions predicting yield and agronomic traits. These results demonstrate the capacity of machine learning models to build unified, interpretable, and explainable models of crop behaviour, and highlight the powerful role of data in the future of food.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4894 ◽  
Author(s):  
Senlin Zhu ◽  
Emmanuel Karlo Nyarko ◽  
Marijana Hadzima-Nyarko

The bio-chemical and physical characteristics of a river are directly affected by water temperature, which thereby affects the overall health of aquatic ecosystems. It is a complex problem to accurately estimate water temperature. Modelling of river water temperature is usually based on a suitable mathematical model and field measurements of various atmospheric factors. In this article, the air–water temperature relationship of the Missouri River is investigated by developing three different machine learning models (Artificial Neural Network (ANN), Gaussian Process Regression (GPR), and Bootstrap Aggregated Decision Trees (BA-DT)). Standard models (linear regression, non-linear regression, and stochastic models) are also developed and compared to machine learning models. Analyzing the three standard models, the stochastic model clearly outperforms the standard linear model and nonlinear model. All the three machine learning models have comparable results and outperform the stochastic model, with GPR having slightly better results for stations No. 2 and 3, while BA-DT has slightly better results for station No. 1. The machine learning models are very effective tools which can be used for the prediction of daily river temperature.


Sign in / Sign up

Export Citation Format

Share Document