scholarly journals The Impact of Training Data Sequence on the Performance of Neuro-Fuzzy Rainfall-Runoff Models with Online Learning

Water ◽  
2018 ◽  
Vol 11 (1) ◽  
pp. 52 ◽  
Author(s):  
Tak Chang ◽  
Amin Talei ◽  
Lloyd Chua ◽  
Sina Alaghmand

The learning algorithms in many of conventional Neuro-Fuzzy Systems (NFS) are based on batch or global learning where all parameters of the fuzzy system are optimized off-line. Although these models have frequently been used, they suffer from a reduced flexibility in their architecture as the number of rules need to be predefined by the user. This study uses a Dynamic Evolving Neural Fuzzy Inference System (DENFIS) in which an evolving, online clustering algorithm, the Evolving Clustering Method (ECM), is implemented. This study focused on evaluating the performance of this model in capturing the rainfall-runoff process and rainfall-water level relationship. The two selected study catchments are located in an urban tropical and in a semi-urbanized area, respectively. The first catchment, Sungai Kayu Ara (23.22 km2), is located in Malaysia, with 10-min rainfall-runoff time-series from which 30 major events are used. The second catchment, Dandenong (272 km2), is located in Victoria, Australia, with daily rainfall and river stage (water level) data from which 11 years of data is used. DENFIS results were then compared with two groups of benchmark models: a regression-based data-driven model known as the Autoregressive Model with Exogenous Inputs (ARX) for both study sites, and physical models Hydrologic Engineering Center–Hydrologic Modelling System (HEC–HMS) and Storm Water Management Model (SWMM) for Sungai Kayu Ara and Dandenong catchments, respectively. DENFIS significantly outperformed the ARX model in both study sites. Moreover, DENFIS was found comparable if not superior to HEC–HMS and SWMM in Sungai Kayu Ara and Dandenong catchments, respectively. A sensitivity analysis was then conducted on DENFIS to assess the impact of training data sequence on its performance. Results showed that starting the training with datasets that include high peaks can improve the model performance. Moreover, datasets with more contrasting values that cover wide range of low to high values can also improve the DENFIS model performance.

2017 ◽  
Vol 3 ◽  
pp. e137 ◽  
Author(s):  
Mona Alshahrani ◽  
Othman Soufan ◽  
Arturo Magana-Mora ◽  
Vladimir B. Bajic

Background Artificial neural networks (ANNs) are a robust class of machine learning models and are a frequent choice for solving classification problems. However, determining the structure of the ANNs is not trivial as a large number of weights (connection links) may lead to overfitting the training data. Although several ANN pruning algorithms have been proposed for the simplification of ANNs, these algorithms are not able to efficiently cope with intricate ANN structures required for complex classification problems. Methods We developed DANNP, a web-based tool, that implements parallelized versions of several ANN pruning algorithms. The DANNP tool uses a modified version of the Fast Compressed Neural Network software implemented in C++ to considerably enhance the running time of the ANN pruning algorithms we implemented. In addition to the performance evaluation of the pruned ANNs, we systematically compared the set of features that remained in the pruned ANN with those obtained by different state-of-the-art feature selection (FS) methods. Results Although the ANN pruning algorithms are not entirely parallelizable, DANNP was able to speed up the ANN pruning up to eight times on a 32-core machine, compared to the serial implementations. To assess the impact of the ANN pruning by DANNP tool, we used 16 datasets from different domains. In eight out of the 16 datasets, DANNP significantly reduced the number of weights by 70%–99%, while maintaining a competitive or better model performance compared to the unpruned ANN. Finally, we used a naïve Bayes classifier derived with the features selected as a byproduct of the ANN pruning and demonstrated that its accuracy is comparable to those obtained by the classifiers trained with the features selected by several state-of-the-art FS methods. The FS ranking methodology proposed in this study allows the users to identify the most discriminant features of the problem at hand. To the best of our knowledge, DANNP (publicly available at www.cbrc.kaust.edu.sa/dannp) is the only available and on-line accessible tool that provides multiple parallelized ANN pruning options. Datasets and DANNP code can be obtained at www.cbrc.kaust.edu.sa/dannp/data.php and https://doi.org/10.5281/zenodo.1001086.


2019 ◽  
Author(s):  
A Johnston ◽  
WM Hochachka ◽  
ME Strimas-Mackey ◽  
V Ruiz Gutierrez ◽  
OJ Robinson ◽  
...  

AbstractCitizen science data are valuable for addressing a wide range of ecological research questions, and there has been a rapid increase in the scope and volume of data available. However, data from large-scale citizen science projects typically present a number of challenges that can inhibit robust ecological inferences. These challenges include: species bias, spatial bias, and variation in effort.To demonstrate addressing key challenges in analysing citizen science data, we use the example of estimating species distributions with data from eBird, a large semi-structured citizen science project. We estimate two widely applied metrics of species distributions: encounter rate and occupancy probability. For each metric, we assess the impact of data processing steps that either degrade or refine the data used in the analyses. We also test whether differences in model performance are maintained at different sample sizes.Model performance improved when data processing and analytical methods addressed the challenges arising from citizen science data. The largest gains in model performance were achieved with: 1) the use of complete checklists (where observers report all the species they detect and identify); and 2) the use of covariates describing variation in effort and detectability for each checklist. Occupancy models were more robust to a lack of complete checklists and effort variables. Improvements in model performance with data refinement were more evident with larger sample sizes.Here, we describe processes to refine semi-structured citizen science data to estimate species distributions. We demonstrate the value of complete checklists, which can inform the design and adaptation of citizen science projects. We also demonstrate the value of information on effort. The methods we have outlined are also likely to improve other forms of inference, and will enable researchers to conduct robust analyses and harness the vast ecological knowledge that exists within citizen science data.


2021 ◽  
Vol 21 (6) ◽  
pp. 257-264
Author(s):  
Hoseon Kang ◽  
Jaewoong Cho ◽  
Hanseung Lee ◽  
Jeonggeun Hwang ◽  
Hyejin Moon

Urban flooding occurs during heavy rains of short duration, so quick and accurate warnings of the danger of inundation are required. Previous research proposed methods to estimate statistics-based urban flood alert criteria based on flood damage records and rainfall data, and developed a Neuro-Fuzzy model for predicting appropriate flood alert criteria. A variety of artificial intelligence algorithms have been applied to the prediction of the urban flood alert criteria, and their usage and predictive precision have been enhanced with the recent development of artificial intelligence. Therefore, this study predicted flood alert criteria and analyzed the effect of applying the technique to augmentation training data using the Artificial Neural Network (ANN) algorithm. The predictive performance of the ANN model was RMSE 3.39-9.80 mm, and the model performance with the extension of training data was RMSE 1.08-6.88 mm, indicating that performance was improved by 29.8-82.6%.


2021 ◽  
Vol 22 (1) ◽  
pp. 287-297
Author(s):  
Dilnoza Umurzakova

The purpose of this article is to develop high-quality combined automatic control systems (ACS) for the water level in the drum of steam boilers of thermal power plants (TPPs), which can significantly improve the quality of regulation and increase the efficiency of TPPs in a wide range of load changes. To improve the quality of water level control in the drum of steam generators of nuclear power plants with a pressurized water-cooled power reactor (PWPR), it is proposed to use a combined automatic control system based on a control loop with a correcting PI-controller tuned to a symmetrical optimum, with smoothing the reference signal and device compensation of the most dangerous internal and external measurable disturbances. A technique has been developed for assessing the impact of changes in the quality characteristics of transients of combined self-propelled guns by the water level in the drum of steam boilers and steam generators on the safety, reliability, durability, and efficiency of thermal power equipment of thermal power plants. Comparison was made of direct indicators of the quality of three ACS (typical and three-pulse, digital system with an observer state, and the proposed combined ACS). The simulation results of transients of the proposed and typical three-pulse self-propelled guns confirmed the advantages of the first. ABSTRAK: Artikel ini bertujuan bagi membina sistem kombinasi automatik (ACS) berkualiti tinggi bagi aras air dalam drum dandang stim tenaga terma logi kuasa (TPP). Ini dapat meningkatkan mutu peraturan dan meningkatkan kecekapan TPP secara signifikan dengan pelbagai perubahan beban. Bagi meningkatkan kualiti kawalan aras air dalam drum penjana wap loji kuasa tenaga nuklear dengan reaktor berpendingin air bertekanan (PWPR). Gabungan sistem kawalan automatik berdasarkan gelung kawalan dengan pembetulan PI telah dicadangkan dan diselaraskan simetri secara optimum, dengan melancarkan isyarat rujukan dan pembetulan peranti dari gangguan yang boleh diukur dari dalam dan luar. Satu teknik telah dibina bagi menilai kesan perubahan ciri kualiti transien gabungan berjentera pada aras air di tong dandang stim dan drum penjana wap pada keselamatan, kebolehpercayaan, ketahanan dan kecekapan peralatan tenaga terma loji janakuasa. Perbandingan dibuat pada kualiti tiga ACS (sistem digital khas dan tiga signal dengan keadaan pemerhati dan gabungan ACS yang dicadangkan). Hasil sistem simulasi transien yang dicadangkan dan tiga signal biasa berjentera mengesahkan kelebihan pada yang pertama.


2019 ◽  
Vol 80 (3) ◽  
pp. 517-528 ◽  
Author(s):  
Qing Chang ◽  
So Kazama ◽  
Yoshiya Touge ◽  
Shunsuke Aita

Abstract Selecting a proper spatial resolution for urban rainfall runoff modeling was not a trivial issue because it could affect the model outputs. Recently, the development of remote sensing technology and increasingly available data source had enabled rainfall runoff process to be modeled at detailed and microscales. However, the models with less complexity might have equally good performance with less model establishment and computation time. This study attempted to explore the impact of model spatial resolution on model performance and parameters. Models with different discretization degree were built up on the basis of actual drainage networks, urban parcels and specific land use. The results showed that there was very little difference in the total runoff volumes while peak flows showed obvious scale effects which could be up to 30%. Generally, model calibration could compensate the scale effect. The calibrated models with different resolution showed similar performances. The consideration of effective impervious area (EIA) as a calibration parameter marginally increased performance of the calibration period but also slightly decreased performance in the validation period which indicated the importance of detailed EIA identification.


2019 ◽  
Vol 5 (12) ◽  
pp. 2738-2746
Author(s):  
Abdul Ghani Soomro ◽  
Muhammad Munir Babar ◽  
Anila Hameem Memon ◽  
Arjumand Zehra Zaidi ◽  
Arshad Ashraf ◽  
...  

This study explores the impact of runoff curve number (CN) on the hydrological model outputs for the Morai watershed, Sindh-Pakistan, using the Soil Conservation Service Curve Number (SCS-CN) method. The SCS-CN method is an empirical technique used to estimate rainfall-runoff volume from precipitation in small watersheds, and CN is an empirically derived parameter used to calculate direct runoff from a rainfall event. CN depends on soil type, its condition, and the land use and land cover (LULC) of an area. Precise knowledge of these factors was not available for the study area, and therefore, a range of values was selected to analyze the sensitivity of the model to the changing CN values. Sensitivity analysis involves a methodological manipulation of model parameters to understand their impacts on model outputs. A range of CN values from 40-90 was selected to determine their effects on model results at the sub-catchment level during the historic flood year of 2010. The model simulated 362 cumecs of peak discharge for CN=90; however, for CN=40, the discharge reduced substantially to 78 cumecs (a 78.46% reduction). Event-based comparison of water volumes for different groups of CN values—90-75, 80-75, 75-70, and 90-40 —showed reductions in water availability of 8.88%, 3.39%, 3.82%, and 41.81%, respectively. Although it is known that the higher the CN, the greater the discharge from direct runoff and the less initial losses, the sensitivity analysis quantifies that impact and determines the amount of associated discharges with changing CN values. The results of the case study suggest that CN is one of the most influential parameters in the simulation of direct runoff. Knowledge of accurate runoff is important in both wet (flood management) and dry periods (water availability). A wide range in the resulting water discharges highlights the importance of precise CN selection. Sensitivity analysis is an essential facet of establishing hydrological models in limited data watersheds. The range of CNs demonstrates an enormous quantitative consequence on direct runoff, the exactness of which is necessary for effective water resource planning and management. The method itself is not novel, but the way it is proposed here can justify investments in determining the accurate CN before initiating mega projects involving rainfall-runoff simulations. Even a small error in CN value may lead to serious consequences. In the current study, the sensitivity analysis challenges the strength of the results of a model in the presence of ambiguity regarding CN value.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0237412
Author(s):  
Louisa-Marie Krützfeldt ◽  
Max Schubach ◽  
Martin Kircher

Regulatory regions, like promoters and enhancers, cover an estimated 5–15% of the human genome. Changes to these sequences are thought to underlie much of human phenotypic variation and a substantial proportion of genetic causes of disease. However, our understanding of their functional encoding in DNA is still very limited. Applying machine or deep learning methods can shed light on this encoding and gapped k-mer support vector machines (gkm-SVMs) or convolutional neural networks (CNNs) are commonly trained on putative regulatory sequences. Here, we investigate the impact of negative sequence selection on model performance. By training gkm-SVM and CNN models on open chromatin data and corresponding negative training dataset, both learners and two approaches for negative training data are compared. Negative sets use either genomic background sequences or sequence shuffles of the positive sequences. Model performance was evaluated on three different tasks: predicting elements active in a cell-type, predicting cell-type specific elements, and predicting elements' relative activity as measured from independent experimental data. Our results indicate strong effects of the negative training data, with genomic backgrounds showing overall best results. Specifically, models trained on highly shuffled sequences perform worse on the complex tasks of tissue-specific activity and quantitative activity prediction, and seem to learn features of artificial sequences rather than regulatory activity. Further, we observe that insufficient matching of genomic background sequences results in model biases. While CNNs achieved and exceeded the performance of gkm-SVMs for larger training datasets, gkm-SVMs gave robust and best results for typical training dataset sizes without the need of hyperparameter optimization.


2020 ◽  
Author(s):  
Louisa-Marie Krützfeldt ◽  
Max Schubach ◽  
Martin Kircher

AbstractRegulatory regions, like promoters and enhancers, cover an estimated 5-15% of the human genome. Changes to these sequences are thought to underlie much of human phenotypic variation and a substantial proportion of genetic causes of disease. However, our understanding of their functional encoding in DNA is still very limited. Applying machine or deep learning methods can shed light on this encoding and gapped k-mer support vector machines (gkm-SVMs) or convolutional neural networks (CNNs) are commonly trained on putative regulatory sequences.Here, we investigate the impact of negative sequence selection on model performance. By training gkm-SVM and CNN models on open chromatin data and corresponding negative training dataset, both learners and two approaches for negative training data are compared. Negative sets use either genomic background sequences or sequence shuffles of the positive sequences. Model performance was evaluated on three different tasks: predicting elements active in a cell-type, predicting cell-type specific elements, and predicting elements’ relative activity as measured from independent experimental data.Our results indicate strong effects of the negative training data, with genomic backgrounds showing overall best results. Specifically, models trained on highly shuffled sequences perform worse on the complex tasks of tissue-specific activity and quantitative activity prediction, and seem to learn features of artificial sequences rather than regulatory activity. Further, we observe that insufficient matching of genomic background sequences results in model biases. While CNNs achieved and exceeded the performance of gkm-SVMs for larger training datasets, gkm-SVMs gave robust and best results for typical training dataset sizes without the need of hyperparameter optimization.


2016 ◽  
Vol 24 (4) ◽  
pp. 1-7 ◽  
Author(s):  
P. Sleziak ◽  
J. Szolgay ◽  
K. Hlavčová ◽  
J. Parajka

AbstractThe main objective of the paper is to understand how the model’s efficiency and the selected climatic indicators are related. The hydrological model applied in this study is a conceptual rainfall-runoff model (the TUW model), which was developed at the Vienna University of Technology. This model was calibrated over three different periods between 1981-2010 in three groups of Austrian catchments (snow, runoff, and soil catchments), which represent a wide range of the hydroclimatic conditions of Austria. The model’s calibration was performed using a differential evolution algorithm (Deoptim). As an objective function, we used a combination of the Nash-Sutcliffe coefficient (NSE) and the logarithmic Nash-Sutcliffe coefficient (logNSE). The model’s efficiency was evaluated by Volume error (VE). Subsequently, we evaluated the relationship between the model’s efficiency (VE) and changes in the climatic indicators (precipitation ΔP, air temperature ΔT). The implications of findings are discussed in the conclusion.


2022 ◽  
Vol 14 (1) ◽  
pp. 201
Author(s):  
Qigen Lin ◽  
Tianyu Ci ◽  
Leibin Wang ◽  
Sanjit Kumar Mondal ◽  
Huaxiang Yin ◽  
...  

The rapid assessment of building damage in earthquake-stricken areas is of paramount importance for emergency response. The development of remote sensing technology has aided in deriving reliable and precise building damage assessments of extensive areas following disasters. It is well documented that convolutional neural network methods have superior performance in earthquake building damage assessment compared with traditional machine learning methods. However, deep learning models require a large number of samples, and sufficient numbers of samples are usually not available in the newly earthquake-stricken areas rapidly enough. At the same time, the historical samples inevitably differ from the new earthquake-affected areas due to the discrepancy of regional building characteristics. For this purpose, this study proposes a data transfer algorithm for evaluating the impact of a single historical training sample on the model performance. Then, beneficial samples are selected to transfer knowledge from the historical data for facilitating the calibration of the new model. Four models are designed with two earthquake damage building datasets and the performance of the models is compared and evaluated. The results show that the data transfer algorithm proposed in this work improves the reliability of the building damage assessment model significantly by filtering samples from the historical data that are suitable for the new task. The performance of the model built based on the data transfer method on the test set of new earthquakes task is approximately 8% higher in overall accuracy compared with the model trained directly with the new earthquake samples when the training data for the new task is only 10% of the historical data and is operating under the objective of four classes of building damage. The proposed data transfer algorithm has effectively enhanced the precision of the seismic building damage assessment in a data-limited context. Thus, it could be applicable to the building damage assessment of new disasters.


Sign in / Sign up

Export Citation Format

Share Document