Landslide Susceptibility Modeling: An Integrated Novel Method Based on Machine Learning Feature Transformation

Husam A. H. Al-Najjar; Biswajeet Pradhan; Bahareh Kalantar; Maher Ibrahim Sameen; M. Santosh; Abdullah Alamri

doi:10.3390/rs13163281

Landslide Susceptibility Modeling: An Integrated Novel Method Based on Machine Learning Feature Transformation

Remote Sensing ◽

10.3390/rs13163281 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3281

Author(s):

Husam A. H. Al-Najjar ◽

Biswajeet Pradhan ◽

Bahareh Kalantar ◽

Maher Ibrahim Sameen ◽

M. Santosh ◽

...

Keyword(s):

Machine Learning ◽

Landslide Susceptibility ◽

Input Data ◽

Vegetation Index ◽

Slope Aspect ◽

Gradient Boosting ◽

Vegetation Density ◽

Power Functions ◽

Susceptibility Modeling ◽

Extreme Gradient Boosting

Landslide susceptibility modeling, an essential approach to mitigate natural disasters, has witnessed considerable improvement following advances in machine learning (ML) techniques. However, in most of the previous studies, the distribution of input data was assumed as being, and treated, as normal or Gaussian; this assumption is not always valid as ML is heavily dependent on the quality of the input data. Therefore, we examine the effectiveness of six feature transformations (minimax normalization (Std-X), logarithmic functions (Log-X), reciprocal function (Rec-X), power functions (Power-X), optimal features (Opt-X), and one-hot encoding (Ohe-X) over the 11conditioning factors (i.e., altitude, slope, aspect, curvature, distance to road, distance to lineament, distance to stream, terrain roughness index (TRI), normalized difference vegetation index (NDVI), land use, and vegetation density). We selected the frequent landslide-prone area in the Cameron Highlands in Malaysia as a case study to test this novel approach. These transformations were then assessed by three benchmark ML methods, namely extreme gradient boosting (XGB), logistic regression (LR), and artificial neural networks (ANN). The 10-fold cross-validation method was used for model evaluations. Our results suggest that using Ohe-X transformation over the ANN model considerably improved performance from 52.244 to 89.398 (37.154% improvement).

Download Full-text

Mapping of the Canopy Openings in Mixed Beech–Fir Forest at Sentinel-2 Subpixel Level Using UAV and Machine Learning Approach

Remote Sensing ◽

10.3390/rs12233925 ◽

2020 ◽

Vol 12 (23) ◽

pp. 3925

Author(s):

Ivan Pilaš ◽

Mateo Gašparović ◽

Alan Novkinić ◽

Damir Klobučar

Keyword(s):

Machine Learning ◽

Forest Canopy ◽

Vegetation Index ◽

Predictive Performance ◽

Spatial Extent ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Extreme Gradient Boosting ◽

Sentinel 2

The presented study demonstrates a bi-sensor approach suitable for rapid and precise up-to-date mapping of forest canopy gaps for the larger spatial extent. The approach makes use of Unmanned Aerial Vehicle (UAV) red, green and blue (RGB) images on smaller areas for highly precise forest canopy mask creation. Sentinel-2 was used as a scaling platform for transferring information from the UAV to a wider spatial extent. Various approaches to an improvement in the predictive performance were examined: (I) the highest R2 of the single satellite index was 0.57, (II) the highest R2 using multiple features obtained from the single-date, S-2 image was 0.624, and (III) the highest R2 on the multitemporal set of S-2 images was 0.697. Satellite indices such as Atmospherically Resistant Vegetation Index (ARVI), Infrared Percentage Vegetation Index (IPVI), Normalized Difference Index (NDI45), Pigment-Specific Simple Ratio Index (PSSRa), Modified Chlorophyll Absorption Ratio Index (MCARI), Color Index (CI), Redness Index (RI), and Normalized Difference Turbidity Index (NDTI) were the dominant predictors in most of the Machine Learning (ML) algorithms. The more complex ML algorithms such as the Support Vector Machines (SVM), Random Forest (RF), Stochastic Gradient Boosting (GBM), Extreme Gradient Boosting (XGBoost), and Catboost that provided the best performance on the training set exhibited weaker generalization capabilities. Therefore, a simpler and more robust Elastic Net (ENET) algorithm was chosen for the final map creation.

Download Full-text

Rainfall-Induced Shallow Landslide Susceptibility Mapping at Two Adjacent Catchments Using Advanced Machine Learning Algorithms

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9100569 ◽

2020 ◽

Vol 9 (10) ◽

pp. 569

Author(s):

Ananta Man Singh Pradhan ◽

Yun-Tae Kim

Keyword(s):

Machine Learning ◽

Landslide Susceptibility ◽

Shallow Landslide ◽

Machine Learning Algorithms ◽

Landslide Inventory ◽

Aerial Photographs ◽

Landslide Susceptibility Mapping ◽

Gradient Boosting ◽

Extreme Gradient Boosting ◽

Testing Accuracy

Landslides impact on human activities and socio-economic development, especially in mountainous areas. This study focuses on the comparison of the prediction capability of advanced machine learning techniques for the rainfall-induced shallow landslide susceptibility of Deokjeokri catchment and Karisanri catchment in South Korea. The influencing factors for landslides, i.e., topographic, hydrologic, soil, forest, and geologic factors, are prepared from various sources based on availability, and a multicollinearity test is also performed to select relevant causative factors. The landslide inventory maps of both catchments are obtained from historical information, aerial photographs and performed field surveys. In this study, Deokjeokri catchment is considered as a training area and Karisanri catchment as a testing area. The landslide inventories contain 748 landslide points in training and 219 points in testing areas. Three landslide susceptibility maps using machine learning models, i.e., Random Forest (RF), Extreme Gradient Boosting (XGBoost) and Deep Neural Network (DNN), are prepared and compared. The outcomes of the analyses are validated using the landslide inventory data. A receiver operating characteristic curve (ROC) method is used to verify the results of the models. The results of this study show that the training accuracy of RF is 0.756 and the testing accuracy is 0.703. Similarly, the training accuracy of XGBoost is 0.757 and testing accuracy is 0.74. The prediction of DNN revealed acceptable agreement between the susceptibility map and the existing landslides, with a training accuracy of 0.855 and testing accuracy of 0.802. The results showed that the DNN model achieved lower prediction error and higher accuracy results than other models for shallow landslide modeling in the study area.

Download Full-text

Landslide Susceptibility Mapping at Two Adjacent Catchments Using Advanced Machine Learning Algorithms

10.20944/preprints202008.0089.v1 ◽

2020 ◽

Author(s):

Ananta Man Singh Pradhan ◽

Yun-Tae Kim

Keyword(s):

Machine Learning ◽

Landslide Susceptibility ◽

Shallow Landslide ◽

Machine Learning Algorithms ◽

Landslide Inventory ◽

Aerial Photographs ◽

Landslide Susceptibility Mapping ◽

Gradient Boosting ◽

Extreme Gradient Boosting ◽

Testing Accuracy

Landslides impact on human activities and socio-economic development especially in mountainous areas. This study focuses on the comparison of the prediction capability of advanced machine learning techniques for rainfall-induced shallow landslide susceptibility of Deokjeokri catchment and Karisanri catchment in South Korea. The influencing factors for landslides i.e. topographic, hydrologic, soil, forest, and geologic factors are prepared from various sources based on availability and a multicollinearity test is also performed to select relevant causative factors. The landslide inventory maps of both catchments are obtained from historical information, aerial photographs and performing field survey. In this study, Deokjeokri catchment is considered as a training area and Karisanri catchment as a testing area. The landslide inventories content 748 landslide points in training and 219 points in testing areas. Three landslide susceptibility maps using machine learning models i.e. Random Forest (RF), Extreme Gradient Boosting (XGBoost) and Deep Neural Network (DNN) are prepared and compared. The outcomes of the analyses are validated using the landslide inventory data. A receiver operating characteristic curve (ROC) method is used to verify the results of the models. The results of this study show that the training accuracy of RF is 0.757 and the testing accuracy is 0.74. Similarly, training accuracy of XGBoost is 0.756 and testing accuracy is 0.703. The prediction of DNN revealed acceptable agreement between susceptibility map and the existing landslides with training and testing accuracy of 0.855 and 0.802, respectively. The results showed that, the DNN model achieved lower prediction error and higher accuracy results than other models for shallow landslide modeling in the study area

Download Full-text

Predicting Undesired Treatment Outcome in Mental Healthcare: Machine Learning Study (Preprint)

10.2196/preprints.17235 ◽

2019 ◽

Author(s):

Kasper Van Mens ◽

Joran Lokkerbol ◽

Richard Janssen ◽

Robert de Lange ◽

Bea Tiemens

Keyword(s):

Machine Learning ◽

Treatment Outcome ◽

Mental Health Treatment ◽

Mental Healthcare ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Trade Off ◽

Trade Offs ◽

Outcome Monitoring ◽

Extreme Gradient Boosting

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.

Download Full-text

Evaluation of Three Different Machine Learning Methods for Object-Based Artificial Terrace Mapping—A Case Study of the Loess Plateau, China

Remote Sensing ◽

10.3390/rs13051021 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1021

Author(s):

Hu Ding ◽

Jiaming Na ◽

Shangjing Jiang ◽

Jie Zhu ◽

Kai Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Loess Plateau ◽

Water Conservation ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

The Loess Plateau ◽

Object Based ◽

Extreme Gradient Boosting

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.

Download Full-text

A Machine Learning Method for Predicting Vegetation Indices in China

Remote Sensing ◽

10.3390/rs13061147 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1147

Author(s):

Xiangqian Li ◽

Wenping Yuan ◽

Wenjie Dong

Keyword(s):

Machine Learning ◽

Growing Season ◽

Crop Growth ◽

Spatiotemporal Distribution ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Severe Drought ◽

Vegetation Growth ◽

Extreme Gradient Boosting ◽

Boosting Method

To forecast the terrestrial carbon cycle and monitor food security, vegetation growth must be accurately predicted; however, current process-based ecosystem and crop-growth models are limited in their effectiveness. This study developed a machine learning model using the extreme gradient boosting method to predict vegetation growth throughout the growing season in China from 2001 to 2018. The model used satellite-derived vegetation data for the first month of each growing season, CO2 concentration, and several meteorological factors as data sources for the explanatory variables. Results showed that the model could reproduce the spatiotemporal distribution of vegetation growth as represented by the satellite-derived normalized difference vegetation index (NDVI). The predictive error for the growing season NDVI was less than 5% for more than 98% of vegetated areas in China; the model represented seasonal variations in NDVI well. The coefficient of determination (R2) between the monthly observed and predicted NDVI was 0.83, and more than 69% of vegetated areas had an R2 > 0.8. The effectiveness of the model was examined for a severe drought year (2009), and results showed that the model could reproduce the spatiotemporal distribution of NDVI even under extreme conditions. This model provides an alternative method for predicting vegetation growth and has great potential for monitoring vegetation dynamics and crop growth.

Download Full-text

Corn Nitrogen Status Diagnosis with an Innovative Multi-Parameter Crop Circle Phenom Sensing System

Remote Sensing ◽

10.3390/rs13030401 ◽

2021 ◽

Vol 13 (3) ◽

pp. 401

Author(s):

Cadan Cummings ◽

Yuxin Miao ◽

Gabriel Dias Paiao ◽

Shujiang Kang ◽

Fabián G. Fernández

Keyword(s):

Machine Learning ◽

Chlorophyll Content ◽

Vegetation Index ◽

Soil Drainage ◽

Management Information ◽

Area Index ◽

Sensing System ◽

Extreme Gradient Boosting ◽

Split Plot ◽

N Status

Accurate and non-destructive in-season crop nitrogen (N) status diagnosis is important for the success of precision N management (PNM). Several active canopy sensors (ACS) with two or three spectral wavebands have been used for this purpose. The Crop Circle Phenom sensor is a new integrated multi-parameter proximal ACS system for in-field plant phenomics with the capability to measure reflectance, structural, and climatic attributes. The objective of this study was to evaluate this multi-parameter Crop Circle Phenom sensing system for in-season diagnosis of corn (Zea mays L.) N status across different soil drainage and tillage systems under variable N supply conditions. The four plant metrics used to approximate in-season N status consist of aboveground biomass (AGB), plant N concentration (PNC), plant N uptake (PNU), and N nutrition index (NNI). A field experiment was conducted in Wells, Minnesota during the 2018 and the 2019 growing seasons with a split-split plot design replicated four times with soil drainage (drained and undrained) as main block, tillage (conventional, no-till, and strip-till) as split plot, and pre-plant N (PPN) rate (0 to 225 in 45 kg ha−1 increment) as the split-split plot. Crop Circle Phenom measurements alongside destructive whole plant samples were collected at V8 +/−1 growth stage. Proximal sensor metrics were used to construct regression models to estimate N status indicators using simple regression (SR) and eXtreme Gradient Boosting (XGB) models. The sensor derived indices tested included normalized difference vegetation index (NDVI), normalized difference red edge (NDRE), estimated canopy chlorophyll content (eCCC), estimated leaf area index (eLAI), ratio vegetation index (RVI), canopy chlorophyll content index (CCCI), fractional photosynthetically active radiation (fPAR), and canopy and air temperature difference (ΔTemp). Management practices such as drainage, tillage, and PPN rate were also included to determine the potential improvement in corn N status diagnosis. Three of the four replicated drained and undrained blocks were randomly selected as training data, and the remaining drained and undrained blocks were used as testing data. The results indicated that SR modeling using NDVI would be sufficient for estimating AGB compared to more complex machine learning methods. Conversely, PNC, PNU, and NNI all benefitted from XGB modeling based on multiple inputs. Among different approaches of XGB modeling, combining management information and Crop Circle Phenom measurements together increased model performance for predicting each of the four plant N metrics compared with solely using sensing data. The PPN rate was the most important management metric for all models compared to drainage and tillage information. Combining Crop Circle Phenom sensor parameters and management information is a promising strategy for in-season diagnosis of corn N status. More studies are needed to further evaluate this new integrated sensing system under diverse on-farm conditions and to test other machine learning models.

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text

Landslide Susceptibility Mapping in the Commune of Oudka, Taounate Province, North Morocco: A Comparative Analysis of Logistic Regression, Multivariate Adaptive Regression Spline, and Artificial Neural Network Models

Environmental and Engineering Geoscience ◽

10.2113/eeg-2243 ◽

2020 ◽

Vol 26 (2) ◽

pp. 185-200

Author(s):

Said Benchelha ◽

Hasnaa Chennaoui Aoudjehane ◽

Mustapha Hakdaoui ◽

Rachid El Hamdouni ◽

Hamou Mansouri ◽

...

Keyword(s):

Logistic Regression ◽

Landslide Susceptibility ◽

Vegetation Index ◽

Multivariate Adaptive Regression Spline ◽

Landslide Susceptibility Mapping ◽

Slope Aspect ◽

Neural Network Models ◽

Regression Spline ◽

Adaptive Regression ◽

Artificial Neural

ABSTRACT Landslide susceptibility indices were calculated and landslide susceptibility maps were generated for the Oudka, Morocco, study area using a geographic information system. The spatial database included current landslide location, topography, soil, hydrology, and lithology, and the eight factors related to landslides (elevation, slope, aspect, distance to streams, distance to roads, distance to faults, lithology, and Normalized Difference Vegetation Index [NDVI]) were calculated or extracted. Logistic regression (LR), multivariate adaptive regression spline (MARSpline), and Artificial Neural Networks (ANN) were the methods used in this study to generate landslide susceptibility indices. Before the calculation, the study area was randomly divided into two parts, the first for the establishment of the model and the second for its validation. The results of the landslide susceptibility analysis were verified using success and prediction rates. The MARSpline model gave a higher success rate (AUC (Area Under The Curve) = 0.963) and prediction rate (AUC = 0.951) than the LR model (AUC = 0.918 and AUC = 0.901) and the ANN model (AUC = 0.886 and AUC = 0.877). These results indicate that the MARSpline model is the best model for determining landslide susceptibility in the study area.

Download Full-text

Prediction of population behavior of Listeria monocytogenes in food using machine learning and a microbial growth and survival database

Scientific Reports ◽

10.1038/s41598-021-90164-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Satoko Hiura ◽

Shige Koseki ◽

Kento Koyama

Keyword(s):

Machine Learning ◽

Data Mining ◽

Listeria Monocytogenes ◽

Water Activity ◽

Bacterial Population ◽

Gradient Boosting ◽

Initial Cell ◽

Data Mining Approach ◽

Cell Counts ◽

Extreme Gradient Boosting

AbstractIn predictive microbiology, statistical models are employed to predict bacterial population behavior in food using environmental factors such as temperature, pH, and water activity. As the amount and complexity of data increase, handling all data with high-dimensional variables becomes a difficult task. We propose a data mining approach to predict bacterial behavior using a database of microbial responses to food environments. Listeria monocytogenes, which is one of pathogens, population growth and inactivation data under 1,007 environmental conditions, including five food categories (beef, culture medium, pork, seafood, and vegetables) and temperatures ranging from 0 to 25 °C, were obtained from the ComBase database (www.combase.cc). We used eXtreme gradient boosting tree, a machine learning algorithm, to predict bacterial population behavior from eight explanatory variables: ‘time’, ‘temperature’, ‘pH’, ‘water activity’, ‘initial cell counts’, ‘whether the viable count is initial cell number’, and two types of categories regarding food. The root mean square error of the observed and predicted values was approximately 1.0 log CFU regardless of food category, and this suggests the possibility of predicting viable bacterial counts in various foods. The data mining approach examined here will enable the prediction of bacterial population behavior in food by identifying hidden patterns within a large amount of data.

Download Full-text