Depth-to-Bedrock Map of China at a Spatial Resolution of 100 Meters

Mapping Intimacies ◽

10.5194/essd-2018-103 ◽

2018 ◽

Cited By ~ 2

Author(s):

Fapeng Yan ◽

Wei Shangguan ◽

Jing Zhang ◽

Bifeng Hu

Keyword(s):

Land Surface ◽

Vegetation Index ◽

Prediction Models ◽

Expert Knowledge ◽

Lower Boundary ◽

Spatial Prediction ◽

Training Data ◽

Ensemble Prediction ◽

Gradient Boosting ◽

Soil Database

Abstract. Depth to bedrock serves as the lower boundary of soil, which influences or controls many of the Earth’s physical and chemical processes. It plays important roles in geology, hydrology, land surface processes, civil engineering, and other related fields. This paper describes the materials and methods to produce a high-resolution (100 m) depth-to-bedrock map of China. Observations were interpreted from borehole log data (ca. 6,382 locations) sampled from the Chinese National Important Geological Borehole Database. To fill in large sampling gaps, additional pseudo-observations generated based on expert knowledge were added. Then, we overlaid the training points on a stack of 133 covariates including climatic images, DEM-derived parameters, land-cover and land-use maps, MODIS surface reflectance bands, vegetation index images, and the Harmonized World Soil Database. Spatial prediction models were developed using the random forests and gradient boosting tree, and ensemble prediction results were then obtained by these two independently fitted models. Finally, uncertainty estimation was generated by the quantile regression forest model. The 10-fold cross-validation showed that the ensemble models explain 57 % of the variation in depth to bedrock. Based on comparison with depth-to-bedrock maps of China extracted from previous global predictions, our predictions showed higher accuracy. More observations, especially those in data-sparse areas, should be added to training data, and more covariates with high precision should be used to further improve the accuracy of spatial predictions. The resulting maps of this study are available on Figshare at the following DOI: https://doi.org/10.6084/m9.figshare.7011524.v1. And they are also available for download at http://globalchange.bnu.edu.cn/ .

Download Full-text

Mapping (un)certainty of machine learning-based spatial prediction models based on predictor space distances

10.5194/egusphere-egu2020-8492 ◽

2020 ◽

Author(s):

Hanna Meyer ◽

Edzer Pebesma

Keyword(s):

Machine Learning ◽

Spatial Patterns ◽

Environmental Science ◽

Prediction Models ◽

Learning Algorithms ◽

Predictor Variable ◽

Spatial Prediction ◽

Machine Learning Algorithms ◽

Training Data ◽

Field Samples

Spatial mapping is an important task in environmental science to reveal spatial patterns and changes of the environment. In this context predictive modelling using flexible machine learning algorithms has become very popular. However, looking at the diversity of modelled (global) maps of environmental variables, there might be increasingly the impression that machine learning is a magic tool to map everything. Recently, the reliability of such maps have been increasingly questioned, calling for a reliable quantification of uncertainties.Though spatial (cross-)validation allows giving a general error estimate for the predictions, models are usually applied to make predictions for a much larger area or might even be transferred to make predictions for an area where they were not trained on. But by making predictions on heterogeneous landscapes, there will be areas that feature environmental properties that have not been observed in the training data and hence not learned by the algorithm. This is problematic as most machine learning algorithms are weak in extrapolations and can only make reliable predictions for environments with conditions the model has knowledge about. Hence predictions for environmental conditions that differ significantly from the training data have to be considered as uncertain.To approach this problem, we suggest a measure of uncertainty that allows identifying locations where predictions should be regarded with care. The proposed uncertainty measure is based on distances to the training data in the multidimensional predictor variable space. However, distances are not equally relevant within the feature space but some variables are more important than others in the machine learning model and hence are mainly responsible for prediction patterns. Therefore, we weight the distances by the model-derived importance of the predictors.&#160;As a case study we use a simulated area-wide response variable for Europe, bio-climatic variables as predictors, as well as simulated field samples. Random Forest is applied as algorithm to predict the simulated response. The model is then used to make predictions for entire Europe. We then calculate the corresponding uncertainty and compare it to the area-wide true prediction error.&#160;The results show that the uncertainty map reflects the patterns in the true error very well and considerably outperforms ensemble-based standard deviations of predictions as indicator for uncertainty.The resulting map of uncertainty gives valuable insights into spatial patterns of prediction uncertainty which is important when the predictions are used as a baseline for decision making or subsequent environmental modelling. Hence, we suggest that a map of distance-based uncertainty should be given in addition to prediction maps.

Download Full-text

Prediction of soil classes in a complex landscape in Southern Brazil

Pesquisa Agropecuária Brasileira ◽

10.1590/s1678-3921.pab2019.v54.00420 ◽

2019 ◽

Vol 54 ◽

Author(s):

Jean Michel Moura-Bueno ◽

Ricardo Simão Diniz Dalmolin ◽

Taciara Zborowski Horst-Heinen ◽

Luciano Campos Cancian ◽

Ricardo Bergamo Schenato ◽

...

Keyword(s):

Random Forest ◽

Prediction Models ◽

Expert Knowledge ◽

Model Performance ◽

Spatial Prediction ◽

Support Vector ◽

Kappa Index ◽

Digital Elevation ◽

Complex Landscape ◽

Elevation Model

Abstract: The objective of this work was to evaluate the use of covariate selection by expert knowledge on the performance of soil class predictive models in a complex landscape, in order to identify the best predictive model for digital soil mapping in the Southern region of Brazil. A total of 164 points were sampled in the field using the conditioned Latin hypercube, considering the covariates elevation, slope, and aspect. From the digital elevation model, environmental covariates were extracted, composing three sets, made up of: 21 covariates, covariates after the exclusion of the multicollinear ones, and covariates chosen by expert knowledge. Prediction was performed with the following models: decision tree, random forest, multiple logistic regression, and support vector machine. The accuracy of the models was evaluated by the kappa index (K), general accuracy (GA), and class accuracy. The prediction models were sensitive to the disproportionate sampling of soil classes. The best predicted map achieved a GA of 71% and K of 0.59. The use of the covariate set chosen by expert knowledge improves model performance in predicting soil classes in a complex landscape, and random forest is the best model for the spatial prediction of soil classes.

Download Full-text

Applying Deep Neural Networks and Ensemble Machine Learning Methods to Forecast Airborne Ambrosia Pollen

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph16111992 ◽

2019 ◽

Vol 16 (11) ◽

pp. 1992 ◽

Cited By ~ 6

Author(s):

Gebreab K. Zewdie ◽

David J. Lary ◽

Estelle Levetin ◽

Gemechu F. Garuma

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Land Surface ◽

Deep Neural Networks ◽

Airborne Pollen ◽

Training Data ◽

Gradient Boosting ◽

Learning Approaches ◽

Ambrosia Pollen ◽

Extreme Gradient Boosting

Allergies to airborne pollen are a significant issue affecting millions of Americans. Consequently, accurately predicting the daily concentration of airborne pollen is of significant public benefit in providing timely alerts. This study presents a method for the robust estimation of the concentration of airborne Ambrosia pollen using a suite of machine learning approaches including deep learning and ensemble learners. Each of these machine learning approaches utilize data from the European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric weather and land surface reanalysis. The machine learning approaches used for developing a suite of empirical models are deep neural networks, extreme gradient boosting, random forests and Bayesian ridge regression methods for developing our predictive model. The training data included twenty-four years of daily pollen concentration measurements together with ECMWF weather and land surface reanalysis data from 1987 to 2011 is used to develop the machine learning predictive models. The last six years of the dataset from 2012 to 2017 is used to independently test the performance of the machine learning models. The correlation coefficients between the estimated and actual pollen abundance for the independent validation datasets for the deep neural networks, random forest, extreme gradient boosting and Bayesian ridge were 0.82, 0.81, 0.81 and 0.75 respectively, showing that machine learning can be used to effectively forecast the concentrations of airborne pollen.

Download Full-text

Use of UAS Multispectral Imagery at Different Physiological Stages for Yield Prediction and Input Resource Optimization in Corn

Remote Sensing ◽

10.3390/rs12152392 ◽

2020 ◽

Vol 12 (15) ◽

pp. 2392

Author(s):

Razieh Barzin ◽

Rohit Pathak ◽

Hossein Lotfi ◽

Jac Varco ◽

Ganesh C. Bora

Keyword(s):

Grain Yield ◽

Vegetation Index ◽

Prediction Models ◽

Feature Selection Method ◽

Spectral Information ◽

Gradient Boosting ◽

Spatial And Temporal Variability ◽

Yield Prediction ◽

Phenological Stages ◽

Red Edge

Changes in spatial and temporal variability in yield estimation are detectable through plant biophysical characteristics observed at different phenological development stages of corn. A multispectral red-edge sensor mounted on an Unmanned Aerial Systems (UAS) can provide spatial and temporal information with high resolution. Spectral analysis of UAS acquired spatiotemporal images can be used to develop a statistical model to predict yield based on different phenological stages. Identifying critical vegetation indices (VIs) and significant spectral information could lead to increased yield prediction accuracy. The objective of this study was to develop a yield prediction model at specific phenological stages using spectral data obtained from a corn field. The available spectral bands (red, blue, green, near infrared (NIR), and red-edge) were used to analyze 26 different VIs. The spectral information was collected from a cornfield at Mississippi State University using a MicaSense multispectral red-edge sensor, mounted on a UAS. In this research, a new empirical method used to reduce the effects of bare soil pixels in acquired images was introduced. The experimental design was a randomized complete block that consisted of 16 blocks with 12 rows of corn planted in each block. Four treatments of nitrogen (N) including 0, 90, 180, and 270 kg/ha were applied randomly. Random forest was utilized as a feature selection method to choose the best combination of variables for different stages. Multiple linear regression and gradient boosting decision trees were used to develop yield prediction models for each specific phenological stage by utilizing the most effective variables at each stage. At the V3 (3 leaves with visible leaf collar) and V4-5 (4-5 leaves with visible leaf collar) stages, the Optimized Soil Adjusted Vegetation Index (OSAVI) and Simplified Canopy Chlorophyll Content Index (SCCCI) were the single dominant variables in the yield predicting models, respectively. A combination of the Green Atmospherically Resistant Index (GARI), Normalized Difference Red-Edge (NDRE), and green Normalized Difference Vegetation Index (GNDVI) at V6-7, SCCCI, and Soil-Adjusted Vegetation Index (SAVI) at V10,11, and SCCCI, Green Leaf Index (GLI), and Visible Atmospherically Resistant Index (VARIgreen) at tasseling stage (VT) were the best indices for predicting grain yield of corn. The prediction models at V10 and VT had the greatest accuracy with a coefficient of determination of 0.90 and 0.93, respectively. Moreover, the SCCCI as a combined index seemed to be the most proper index for predicting yield at most of the phenological stages. As corn development progressed, the models predicted final grain yield more accurately.

Download Full-text

Development of deep learning algorithms for predicting blastocyst formation and quality by time-lapse monitoring

Communications Biology ◽

10.1038/s42003-021-01937-1 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Qiuyue Liao ◽

Qi Zhang ◽

Xue Feng ◽

Haibo Huang ◽

Haohao Xu ◽

...

Keyword(s):

Deep Learning ◽

Prediction Models ◽

Short Term Memory ◽

Time Lapse ◽

Ensemble Prediction ◽

Gradient Boosting ◽

Blastocyst Formation ◽

Developmental Potential ◽

Long Short Term Memory ◽

Lstm Network

AbstractApproaches to reliably predict the developmental potential of embryos and select suitable embryos for blastocyst culture are needed. The development of time-lapse monitoring (TLM) and artificial intelligence (AI) may help solve this problem. Here, we report deep learning models that can accurately predict blastocyst formation and usable blastocysts using TLM videos of the embryo’s first three days. The DenseNet201 network, focal loss, long short-term memory (LSTM) network and gradient boosting classifier were mainly employed, and video preparation algorithms, spatial stream and temporal stream models were developed into ensemble prediction models called STEM and STEM+. STEM exhibited 78.2% accuracy and 0.82 AUC in predicting blastocyst formation, and STEM+ achieved 71.9% accuracy and 0.79 AUC in predicting usable blastocysts. We believe the models are beneficial for blastocyst formation prediction and embryo selection in clinical practice, and our modeling methods will provide valuable information for analyzing medical videos with continuous appearance variation.

Download Full-text

Detection and modeling of soil salinity variations in arid lands using remote sensing data

Open Geosciences ◽

10.1515/geo-2020-0244 ◽

2021 ◽

Vol 13 (1) ◽

pp. 443-453

Author(s):

Abduldaem S. Alqasemi ◽

Majed Ibrahim ◽

Ayad M. Fadhil Al-Quraishi ◽

Hakim Saibi ◽

A’kif Al-Fugara ◽

...

Keyword(s):

Remote Sensing ◽

Soil Salinity ◽

Land Surface ◽

Vegetation Index ◽

Normalized Difference Vegetation Index ◽

Prediction Models ◽

Field Measurements ◽

Soil Salinization ◽

Landsat 8 ◽

Salinity Variations

Abstract Soil salinization is a ubiquitous global problem. The literature supports the integration of remote sensing (RS) techniques and field measurements as effective methods for developing soil salinity prediction models. The objectives of this study were to (i) estimate the level of soil salinity in Abu Dhabi using spectral indices and field measurements and (ii) develop a model for detecting and mapping soil salinity variations in the study area using RS data. We integrated Landsat 8 data with the electrical conductivity measurements of soil samples taken from the study area. Statistical analysis of the integrated data showed that the normalized difference vegetation index and bare soil index showed moderate correlations among the examined indices. The relation between these two indices can contribute to the development of successful soil salinity prediction models. Results show that 31% of the soil in the study area is moderately saline and 46% of the soil is highly saline. The results support that geoinformatic techniques using RS data and technologies constitute an effective tool for detecting soil salinity by modeling and mapping the spatial distribution of saline soils. Furthermore, we observed a low correlation between soil salinity and the nighttime land surface temperature.

Download Full-text

Predicting WNV Circulation in Italy Using Earth Observation Data and Extreme Gradient Boosting Model

Remote Sensing ◽

10.3390/rs12183064 ◽

2020 ◽

Vol 12 (18) ◽

pp. 3064

Author(s):

Luca Candeloro ◽

Carla Ippoliti ◽

Federica Iapaolo ◽

Federica Monaco ◽

Daniela Morelli ◽

...

Keyword(s):

At Risk ◽

Environmental Conditions ◽

Land Surface ◽

Vegetation Index ◽

Earth Observation ◽

Gradient Boosting ◽

Observation Data ◽

West Nile ◽

Extreme Gradient Boosting ◽

Earth Observation Data

West Nile Disease (WND) is one of the most spread zoonosis in Italy and Europe caused by a vector-borne virus. Its transmission cycle is well understood, with birds acting as the primary hosts and mosquito vectors transmitting the virus to other birds, while humans and horses are occasional dead-end hosts. Identifying suitable environmental conditions across large areas containing multiple species of potential hosts and vectors can be difficult. The recent and massive availability of Earth Observation data and the continuous development of innovative Machine Learning methods can contribute to automatically identify patterns in big datasets and to make highly accurate identification of areas at risk. In this paper, we investigated the West Nile Virus (WNV) circulation in relation to Land Surface Temperature, Normalized Difference Vegetation Index and Surface Soil Moisture collected during the 160 days before the infection took place, with the aim of evaluating the predictive capacity of lagged remotely sensed variables in the identification of areas at risk for WNV circulation. WNV detection in mosquitoes, birds and horses in 2017, 2018 and 2019, has been collected from the National Information System for Animal Disease Notification. An Extreme Gradient Boosting model was trained with data from 2017 and 2018 and tested for the 2019 epidemic, predicting the spatio-temporal WNV circulation two weeks in advance with an overall accuracy of 0.84. This work lays the basis for a future early warning system that could alert public authorities when climatic and environmental conditions become favourable to the onset and spread of WNV.

Download Full-text

A Framework to Predict Consumption Sustainability Levels of Individuals

Sustainability ◽

10.3390/su12041423 ◽

2020 ◽

Vol 12 (4) ◽

pp. 1423 ◽

Cited By ~ 2

Author(s):

Arielle Moro ◽

Adrian Holzer

Keyword(s):

Information Systems ◽

Energy Demand ◽

Prediction Models ◽

Multinomial Logistic Regression ◽

Theoretical Models ◽

Ensemble Prediction ◽

Gradient Boosting ◽

Factual Knowledge ◽

Household Energy ◽

Green Information Systems

Innovative Information Systems services have the potential to promote more sustainable behavior. For these so-called Green Information Systems (Green IS) to work well, they should be tailored to individual behavior and attitudes. Although various theoretical models already exist, there is currently no technological solution that automatically estimates individual’s current sustainability levels related to their consumption behaviors in various consumption domains (e.g., mobility and housing). The paper aims at addressing this gap and presents the design of GREENPREDICT, a framework that enables to predict these levels based on multiple features, such as demographic, socio-economic, psychological, and factual knowledge about energy information. To do so, the paper presents and evaluates six different classifiers to predict acts of consumption on the Swiss Household Energy Demand Survey (SHEDS) dataset containing survey answers of 2000 representative individuals living in Switzerland. The results highlight that the ensemble prediction models (i.e., random forests and gradient boosting trees) and the multinomial logistic regression model are the most accurate for the mobility and housing prediction tasks.

Download Full-text

DETERMINATION OF VEGETATION INDEX, LAND SURFACE TEMPERATURE AND PRECIPITATION AMOUNTS USING REMOTE SENSING DATA

JOURNAL OF AGRO PROCESSING ◽

10.26739/2181-9904-2020-5-1 ◽

2020 ◽

Vol 5 (2) ◽

pp. 4-10

Author(s):

Rashid Jaksibaev ◽

Keyword(s):

Remote Sensing ◽

Surface Temperature ◽

Land Surface Temperature ◽

Land Surface ◽

Vegetation Index ◽

Remote Sensing Data ◽

Sensing Data ◽

Temperature And Precipitation

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text