Reconstruction of Multidecadal Country-Aggregated Hydro Power Generation in Europe Based on a Random Forest Model

Linh T. T. Ho; Laurent Dubus; Matteo De Felice; Alberto Troccoli

doi:10.3390/en13071786

Reconstruction of Multidecadal Country-Aggregated Hydro Power Generation in Europe Based on a Random Forest Model

Energies ◽

10.3390/en13071786 ◽

2020 ◽

Vol 13 (7) ◽

pp. 1786

Author(s):

Linh T. T. Ho ◽

Laurent Dubus ◽

Matteo De Felice ◽

Alberto Troccoli

Keyword(s):

Power Generation ◽

Random Forest ◽

Model Performance ◽

Absolute Error ◽

Random Forest Model ◽

Low Carbon ◽

Climate Data ◽

Hydro Power ◽

Forest Model ◽

Continental Scale

Hydro power can provide a source of dispatchable low-carbon electricity and a storage solution in a climate-dependent energy mix with high shares of wind and solar production. Therefore, understanding the effect climate has on hydro power generation is critical to ensure a stable energy supply, particularly at a continental scale. Here, we introduce a framework using climate data to model hydro power generation at the country level based on a machine learning method, the random forest model, to produce a publicly accessible hydro power dataset from 1979 to present for twelve European countries. In addition to producing a consistent European hydro power generation dataset covering the past 40 years, the specific novelty of this approach is to focus on the lagged effect of climate variability on hydro power. Specifically, multiple lagged values of temperature and precipitation are used. Overall, the model shows promising results, with the correlation values ranging between 0.85 and 0.98 for run-of-river and between 0.73 and 0.90 for reservoir-based generation. Compared to the more standard optimal lag approach the normalised mean absolute error reduces by an average of 10.23% and 5.99%, respectively. The model was also implemented over six Italian bidding zones to also test its skill at the sub-country scale. The model performance is only slightly degraded at the bidding zone level, but this also depends on the actual installed capacity, with higher capacities displaying higher performance. The framework and results presented could provide a useful reference for applications such as pan-European (continental) hydro power planning and for system adequacy and extreme events assessments.

Download Full-text

A novel atrial fibrillation prediction model for Chinese subjects: a nationwide cohort investigation of 682 237 study participants with random forest model

EP Europace ◽

10.1093/europace/euz036 ◽

2019 ◽

Vol 21 (9) ◽

pp. 1307-1312 ◽

Cited By ~ 6

Author(s):

Wei-Syun Hu ◽

Meng-Hsuen Hsieh ◽

Cheng-Li Lin

Keyword(s):

Atrial Fibrillation ◽

Random Forest ◽

Prediction Model ◽

Roc Curve ◽

Model Performance ◽

Weighted Average ◽

Random Forest Model ◽

Test Set ◽

Forest Model ◽

Data Points

Abstract Aims We aimed to construct a random forest model to predict atrial fibrillation (AF) in Chinese population. Methods and results This study was comprised of 682 237 subjects with or without AF. Each subject had 19 features that included the subjects’ age, gender, underlying diseases, CHA2DS2-VASc score, and follow-up period. The data were split into train and test sets at an approximate 9:1 ratio: 614 013 data points were placed into the train set and 68 224 data points were placed into the test set. In this study, weighted average F1, precision, and recall values were used to measure prediction model performance. The F1, precision, and recall values were calculated across the train set, the test set, and all data. The area under receiving operating characteristic (ROC) curve was also used to evaluate the performance of the prediction model. The prediction model achieved a k-fold cross-validation accuracy of 0.979 (k = 10). In the test set, the prediction model achieved an F1 value of 0.968, precision value of 0.958, and recall value of 0.979. The area under ROC curve of the model was 0.948 (95% confidence interval 0.947–0.949). This model was validated with a separate dataset. Conclusions This study showed a novel AF risk prediction scheme for Chinese individuals with random forest model methodology.

Download Full-text

Assessment of Native Radar Reflectivity and Radar Rainfall Estimates for Discharge Forecasting in Mountain Catchments with a Random Forest Model

Remote Sensing ◽

10.3390/rs12121986 ◽

2020 ◽

Vol 12 (12) ◽

pp. 1986 ◽

Cited By ~ 1

Author(s):

Johanna Orellana-Alvear ◽

Rolando Célleri ◽

Rütger Rollenbeck ◽

Paul Muñoz ◽

Pablo Contreras ◽

...

Keyword(s):

Soil Moisture ◽

Random Forest ◽

Model Performance ◽

Radar Data ◽

Rain Gauge ◽

Random Forest Model ◽

Mountain Regions ◽

Radar Rainfall ◽

Forest Model ◽

Rainfall Estimates

Discharge forecasting is a key component for early warning systems and extremely useful for decision makers. Forecasting models require accurate rainfall estimations of high spatial resolution and other geomorphological characteristics of the catchment, which are rarely available in remote mountain regions such as the Andean highlands. While radar data is available in some mountain areas, the absence of a well distributed rain gauge network makes it hard to obtain accurate rainfall maps. Thus, this study explored a Random Forest model and its ability to leverage native radar data (i.e., reflectivity) by providing a simplified but efficient discharge forecasting model for a representative mountain catchment in the southern Andes of Ecuador. This model was compared with another that used as input derived radar rainfall (i.e., rainfall depth), obtained after the transformation from reflectivity to rainfall rate by using a local Z-R relation and a rain gauge-based bias adjustment. In addition, the influence of a soil moisture proxy was evaluated. Radar and runoff data from April 2015 to June 2017 were used. Results showed that (i) model performance was similar by using either native or derived radar data as inputs (0.66 < NSE < 0.75; 0.72 < KGE < 0.78). Thus, exhaustive pre-processing for obtaining radar rainfall estimates can be avoided for discharge forecasting. (ii) Soil moisture representation as input of the model did not significantly improve model performance (i.e., NSE increased from 0.66 to 0.68). Finally, this native radar data-based model constitutes a promising alternative for discharge forecasting in remote mountain regions where ground monitoring is scarce and hardly available.

Download Full-text

A National-Scale 1-km Resolution PM2.5 Estimation Model over Japan Using MAIAC AOD and a Two-Stage Random Forest Model

Remote Sensing ◽

10.3390/rs13183657 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3657

Author(s):

Chau-Ren Jung ◽

Wei-Ting Chen ◽

Shoji F. Nakayama

Keyword(s):

Random Forest ◽

Cross Validation ◽

Model Performance ◽

Random Forest Model ◽

Scale Model ◽

National Scale ◽

Two Stage ◽

Forest Model ◽

Monsoon Area ◽

Fold Cross Validation

Satellite-based models for estimating concentrations of particulate matter with an aerodynamic diameter less than 2.5 μm (PM2.5) have seldom been developed in islands with complex topography over the monsoon area, where the transport of PM2.5 is influenced by both the synoptic-scale winds and local-scale circulations compared with the continental regions. We validated Multi-Angle Implementation of Atmospheric Correction (MAIAC) aerosol optical depth (AOD) with ground observations in Japan and developed a 1-km-resolution national-scale model between 2011 and 2016 to estimate daily PM2.5 concentrations. A two-stage random forest model integrating MAIAC AOD with meteorological variables and land use data was applied to develop the model. The first-stage random forest model was used to impute the missing AOD values. The second-stage random forest model was then utilised to estimate ground PM2.5 concentrations. Ten-fold cross-validation was performed to evaluate the model performance. There was good consistency between MAIAC AOD and ground truth in Japan (correlation coefficient = 0.82 and 74.62% of data falling within the expected error). For model training, the model showed a training coefficient of determination (R2) of 0.98 and a root mean square error (RMSE) of 1.22 μg/m3. For the 10-fold cross-validation, the cross-validation R2 and RMSE of the model were 0.86 and 3.02 μg/m3, respectively. A subsite validation was used to validate the model at the grids overlapping with the AERONET sites, and the model performance was excellent at these sites with a validation R2 (RMSE) of 0.94 (1.78 μg/m3). Additionally, the model performance increased as increased AOD coverage. The top-ten important predictors for estimating ground PM2.5 concentrations were day of the year, temperature, AOD, relative humidity, 10-m-height zonal wind, 10-m-height meridional wind, boundary layer height, precipitation, surface pressure, and population density. MAIAC AOD showed high retrieval accuracy in Japan. The performance of the satellite-based model was excellent, which showed that PM2.5 estimates derived from the model were reliable and accurate. These estimates can be used to assess both the short-term and long-term effects of PM2.5 on health outcomes in epidemiological studies.

Download Full-text

Agricultural Irrigation Area Prediction Based on Improved Random Forest Model

10.21203/rs.3.rs-156767/v1 ◽

2021 ◽

Author(s):

Guangda Gao ◽

Maofa Wang ◽

Hongliang Huang ◽

Weiyu Tang

Keyword(s):

Random Forest ◽

Prediction Models ◽

Absolute Error ◽

Mean Value ◽

Optimal Number ◽

Random Forest Model ◽

Irrigation Area ◽

Forest Model ◽

Grid Search Method ◽

The World

Abstract The food problem is a major problem of common concern in the world, and the prediction of irrigation area can promote the solution of food and agricultural problems. In this paper, the data of grain production and irrigation area in the world are analyzed. An improved Random Forest Regression model is proposed and applied to the prediction of irrigation area. Based on ordinary Random Forest and Limit Tree Regression algorithm, an improved random forest prediction model for irrigation area in China is proposed. Firstly, the arithmetic mean value (AMM) of mean square error (MSE) and mean absolute error (MAE) was used as the evaluation index of the improved impure function and irrigation area prediction effect. Then, the grid search method is used to determine the optimal number of decision trees (70 trees and 30 trees respectively) in ordinary random forest and limit tree regression, and a new improved random forest model is established. After following, the model is compared with other prediction models, and 10 fold cross validation shows the rationality of the model. Finally, the error analysis of the improved Random Forest model shows that the prediction error is small. It is expected to be applied in the annual analysis of irrigation area in China.

Download Full-text

Spatial modeling of gully head erosion on the Loess Plateau using a certainty factor and random forest model

The Science of The Total Environment ◽

10.1016/j.scitotenv.2021.147040 ◽

2021 ◽

Vol 783 ◽

pp. 147040

Author(s):

Chengcheng Jiang ◽

Wen Fan ◽

Ningyu Yu ◽

Enlong Liu

Keyword(s):

Random Forest ◽

Loess Plateau ◽

Spatial Modeling ◽

Random Forest Model ◽

Certainty Factor ◽

The Loess Plateau ◽

Forest Model ◽

Gully Head

Download Full-text

Clinical trial registries as Scientometric data: A novel solution for linking and deduplicating clinical trials from multiple registries

Scientometrics ◽

10.1007/s11192-021-04111-w ◽

2021 ◽

Author(s):

Christian Thiele ◽

Gerrit Hirschfeld ◽

Ruth von Brachel

Keyword(s):

Clinical Trials ◽

Random Forest ◽

Random Forest Model ◽

Scientometric Analysis ◽

Data Set ◽

The Public ◽

Forest Model ◽

Clinical Trial Registries ◽

Multiple Primary ◽

Clinical Trials Registry

AbstractRegistries of clinical trials are a potential source for scientometric analysis of medical research and serve important functions for the research community and the public at large. Clinical trials that recruit patients in Germany are usually registered in the German Clinical Trials Register (DRKS) or in international registries such as ClinicalTrials.gov. Furthermore, the International Clinical Trials Registry Platform (ICTRP) aggregates trials from multiple primary registries. We queried the DRKS, ClinicalTrials.gov, and the ICTRP for trials with a recruiting location in Germany. Trials that were registered in multiple registries were linked using the primary and secondary identifiers and a Random Forest model based on various similarity metrics. We identified 35,912 trials that were conducted in Germany. The majority of the trials was registered in multiple databases. 32,106 trials were linked using primary IDs, 26 were linked using a Random Forest model, and 10,537 internal duplicates on ICTRP were identified using the Random Forest model after finding pairs with matching primary or secondary IDs. In cross-validation, the Random Forest increased the F1-score from 96.4% to 97.1% compared to a linkage based solely on secondary IDs on a manually labelled data set. 28% of all trials were registered in the German DRKS. 54% of the trials on ClinicalTrials.gov, 43% of the trials on the DRKS and 56% of the trials on the ICTRP were pre-registered. The ratio of pre-registered studies and the ratio of studies that are registered in the DRKS increased over time.

Download Full-text

Discrimination of the geographic origins and varieties of wine grapes using high-throughput sequencing assisted by a random forest model

LWT ◽

10.1016/j.lwt.2021.111333 ◽

2021 ◽

pp. 111333

Author(s):

Feifei Gao ◽

Guihua Zeng ◽

Bin Wang ◽

Jing Xiao ◽

Liang Zhang ◽

...

Keyword(s):

Random Forest ◽

High Throughput ◽

High Throughput Sequencing ◽

Random Forest Model ◽

Wine Grapes ◽

Forest Model ◽

Geographic Origins

Download Full-text

Multi-Scenario Prediction of Intra-Urban Land Use Change Using a Cellular Automata-Random Forest Model

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10080503 ◽

2021 ◽

Vol 10 (8) ◽

pp. 503

Author(s):

Hang Liu ◽

Riken Homma ◽

Qiang Liu ◽

Congying Fang

Keyword(s):

Land Use ◽

Random Forest ◽

Cellular Automata ◽

Land Use Change ◽

Urban Land ◽

Urban Land Use ◽

Random Forest Model ◽

Growth Trend ◽

Related Factors ◽

Forest Model

The simulation of future land use can provide decision support for urban planners and decision makers, which is important for sustainable urban development. Using a cellular automata-random forest model, we considered two scenarios to predict intra-land use changes in Kumamoto City from 2018 to 2030: an unconstrained development scenario, and a planning-constrained development scenario that considers disaster-related factors. The random forest was used to calculate the transition probabilities and the importance of driving factors, and cellular automata were used for future land use prediction. The results show that disaster-related factors greatly influence land vacancy, while urban planning factors are more important for medium high-rise residential, commercial, and public facilities. Under the unconstrained development scenario, urban land use tends towards spatially disordered growth in the total amount of steady growth, with the largest increase in low-rise residential areas. Under the planning-constrained development scenario that considers disaster-related factors, the urban land area will continue to grow, albeit slowly and with a compact growth trend. This study provides planners with information on the relevant trends in different scenarios of land use change in Kumamoto City. Furthermore, it provides a reference for Kumamoto City’s future post-disaster recovery and reconstruction planning.

Download Full-text

Estimates of daily ground-level NO2 concentrations in China based on Random Forest model integrated K-means

Advances in Applied Energy ◽

10.1016/j.adapen.2021.100017 ◽

2021 ◽

pp. 100017

Author(s):

Xinyu Dou ◽

Cuijuan Liao ◽

Hengqi Wang ◽

Ying Huang ◽

Ying Tu ◽

...

Keyword(s):

Random Forest ◽

Ground Level ◽

Random Forest Model ◽

Forest Model

Download Full-text

Improving satellite-based estimation of surface ozone across China during 2008–2019 using iterative random forest model and high-resolution grid meteorological data

Sustainable Cities and Society ◽

10.1016/j.scs.2021.102807 ◽

2021 ◽

pp. 102807

Author(s):

Gongbo Chen ◽

Jiang Chen ◽

Guang-hui Dong ◽

Bo-yi Yang ◽

Yisi Liu ◽

...

Keyword(s):

High Resolution ◽

Random Forest ◽

Meteorological Data ◽

Surface Ozone ◽

Random Forest Model ◽

Forest Model

Download Full-text