Spatial Modeling of Snow Avalanche Using Machine Learning Models and Geo-Environmental Factors: Comparison of Effectiveness in Two Mountain Regions

Omid Rahmati; Omid Ghorbanzadeh; Teimur Teimurian; Farnoush Mohammadi; John P. Tiefenbacher; Fatemeh Falah; Saied Pirasteh; Phuong-Thao Thi Ngo; Dieu Tien Bui

doi:10.3390/rs11242995

Spatial Modeling of Snow Avalanche Using Machine Learning Models and Geo-Environmental Factors: Comparison of Effectiveness in Two Mountain Regions

Remote Sensing ◽

10.3390/rs11242995 ◽

2019 ◽

Vol 11 (24) ◽

pp. 2995 ◽

Cited By ~ 10

Author(s):

Omid Rahmati ◽

Omid Ghorbanzadeh ◽

Teimur Teimurian ◽

Farnoush Mohammadi ◽

John P. Tiefenbacher ◽

...

Keyword(s):

Machine Learning ◽

Goodness Of Fit ◽

Snow Avalanche ◽

Slope Position ◽

Support Vector ◽

Hazard Mapping ◽

Ensemble Model ◽

Mountainous Regions ◽

Avalanche Hazard ◽

Statistical Measures

Although snow avalanches are among the most destructive natural disasters, and result in losses of life and economic damages in mountainous regions, far too little attention has been paid to the prediction of the snow avalanche hazard using advanced machine learning (ML) models. In this study, the applicability and efficiency of four ML models: support vector machine (SVM), random forest (RF), naïve Bayes (NB) and generalized additive model (GAM), for snow avalanche hazard mapping, were evaluated. Fourteen geomorphometric, topographic and hydrologic factors were selected as predictor variables in the modeling. This study was conducted in the Darvan and Zarrinehroud watersheds of Iran. The goodness-of-fit and predictive performance of the models was evaluated using two statistical measures: the area under the receiver operating characteristic curve (AUROC) and the true skill statistic (TSS). Finally, an ensemble model was developed based upon the results of the individual models. Results show that, among individual models, RF was best, performing well in both the Darvan (AUROC = 0.964, TSS = 0.862) and Zarrinehroud (AUROC = 0.956, TSS = 0.881) watersheds. The accuracy of the ensemble model was slightly better than all individual models for generating the snow avalanche hazard map, as validation analyses showed an AUROC = 0.966 and a TSS = 0.865 in the Darvan watershed, and an AUROC value of 0.958 and a TSS value of 0.877 for the Zarrinehroud watershed. The results indicate that slope length, lithology and relative slope position (RSP) are the most important factors controlling snow avalanche distribution. The methodology developed in this study can improve risk-based decision making, increases the credibility and reliability of snow avalanche hazard predictions and can provide critical information for hazard managers.

Download Full-text

Mass wasting susceptibility assessment of snow avalanches using machine learning models

Scientific Reports ◽

10.1038/s41598-020-75476-w ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Bahram Choubin ◽

Moslem Borji ◽

Farzaneh Sajedi Hosseini ◽

Amirhosein Mosavi ◽

Adrienn A. Dineva

Keyword(s):

Machine Learning ◽

Boosted Regression Trees ◽

Snow Avalanche ◽

Superior Performance ◽

Recursive Feature Elimination ◽

Support Vector ◽

Snow Avalanches ◽

Susceptibility Assessment ◽

Mass Wasting ◽

Mountainous Regions

Abstract Snow avalanche is among the most harmful natural hazards with major socioeconomic and environmental destruction in the cold and mountainous regions. The devastating propagation and accumulation of the snow avalanche debris and mass wasting of surface rocks and vegetation particles threaten human life, transportation networks, built environments, ecosystems, and water resources. Susceptibility assessment of snow avalanche hazardous areas is of utmost importance for mitigation and development of land-use policies. This research evaluates the performance of the well-known machine learning methods, i.e., generalized additive model (GAM), multivariate adaptive regression spline (MARS), boosted regression trees (BRT), and support vector machine (SVM), in modeling the mass wasting hazard induced by snow avalanches. The key features are identified by the recursive feature elimination (RFE) method and used for the model calibration. The results indicated a good performance of the modeling process (Accuracy > 0.88, Kappa > 0.76, Precision > 0.84, Recall > 0.86, and AUC > 0.89), which the SVM model highlighted superior performance than others. Sensitivity analysis demonstrated that the topographic position index (TPI) and distance to stream (DTS) were the most important variables which had more contribution in producing the susceptibility maps.

Download Full-text

Evaluating the Performance of Individual and Novel Ensemble of Machine Learning and Statistical Models for Landslide Susceptibility Assessment at Rudraprayag District of Garhwal Himalaya

Applied Sciences ◽

10.3390/app10113772 ◽

2020 ◽

Vol 10 (11) ◽

pp. 3772 ◽

Cited By ~ 7

Author(s):

Sunil Saha ◽

Anik Saha ◽

Tusar Kanti Hembram ◽

Biswajeet Pradhan ◽

Abdullah M. Alamri

Keyword(s):

Machine Learning ◽

Landslide Susceptibility ◽

Goodness Of Fit ◽

Predictive Ability ◽

Absolute Error ◽

Garhwal Himalaya ◽

Rate Curve ◽

Support Vector ◽

Predictive Capacity ◽

Mountainous Regions

Landslides are known as the world’s most dangerous threat in mountainous regions and pose a critical obstacle for both economic and infrastructural progress. It is, therefore, quite relevant to discuss the pattern of spatial incidence of this phenomenon. The current research manifests a set of individual and ensemble of machine learning and probabilistic approaches like an artificial neural network (ANN), support vector machine (SVM), random forest (RF), logistic regression (LR), and their ensembles such as ANN-RF, ANN-SVM, SVM-RF, SVM-LR, LR-RF, LR-ANN, ANN-LR-RF, ANN-RF-SVM, ANN-SVM-LR, RF-SVM-LR, and ANN-RF-SVM-LR for mapping landslide susceptibility in Rudraprayag district of Garhwal Himalaya, India. A landslide inventory map along with sixteen landslide conditioning factors (LCFs) was used. Randomly partitioned sets of 70%:30% were used to ascertain the goodness of fit and predictive ability of the models. The contribution of LCFs was analyzed using the RF model. The altitude and drainage density were found to be the responsible factors in causing the landslide in the study area according to the RF model. The robustness of models was assessed through three threshold dependent measures, i.e., receiver operating characteristic (ROC), precision and accuracy, and two threshold independent measures, i.e., mean-absolute-error (MAE) and root-mean-square-error (RMSE). Finally, using the compound factor (CF) method, the models were prioritized based on the results of the validation methods to choose best model. Results show that ANN-RF-LR indicated a realistic finding, concentrating only on 17.74% of the study area as highly susceptible to landslide. The ANN-RF-LR ensemble demonstrated the highest goodness of fit and predictive capacity with respective values of 87.83% (area under the success rate curve) and 93.98% (area under prediction rate curve), and the highest robustness correspondingly. These attempts will play a significant role in ensemble modeling, in building reliable and comprehensive models. The proposed ANN-RF-LR ensemble model may be used in the other geographic areas having similar geo-environmental conditions. It may also be used in other types of geo-hazard modeling.

Download Full-text

Predicting lung adenocarcinoma disease progression using methylation-correlated blocks and ensemble machine learning classifiers

PeerJ ◽

10.7717/peerj.10884 ◽

2021 ◽

Vol 9 ◽

pp. e10884

Author(s):

Xin Yu ◽

Qian Yang ◽

Dong Wang ◽

Zhaoyang Li ◽

Nianhang Chen ◽

...

Keyword(s):

Machine Learning ◽

Lung Adenocarcinoma ◽

Cox Regression ◽

Characteristic Curve ◽

The Cancer Genome Atlas ◽

Support Vector ◽

Survival Prediction ◽

Ensemble Model ◽

Training Set ◽

Cpg Sites

Applying the knowledge that methyltransferases and demethylases can modify adjacent cytosine-phosphorothioate-guanine (CpG) sites in the same DNA strand, we found that combining multiple CpGs into a single block may improve cancer diagnosis. However, survival prediction remains a challenge. In this study, we developed a pipeline named “stacked ensemble of machine learning models for methylation-correlated blocks” (EnMCB) that combined Cox regression, support vector regression (SVR), and elastic-net models to construct signatures based on DNA methylation-correlated blocks for lung adenocarcinoma (LUAD) survival prediction. We used methylation profiles from the Cancer Genome Atlas (TCGA) as the training set, and profiles from the Gene Expression Omnibus (GEO) as validation and testing sets. First, we partitioned the genome into blocks of tightly co-methylated CpG sites, which we termed methylation-correlated blocks (MCBs). After partitioning and feature selection, we observed different diagnostic capacities for predicting patient survival across the models. We combined the multiple models into a single stacking ensemble model. The stacking ensemble model based on the top-ranked block had the area under the receiver operating characteristic curve of 0.622 in the TCGA training set, 0.773 in the validation set, and 0.698 in the testing set. When stratified by clinicopathological risk factors, the risk score predicted by the top-ranked MCB was an independent prognostic factor. Our results showed that our pipeline was a reliable tool that may facilitate MCB selection and survival prediction.

Download Full-text

Modeling of Aboveground Biomass with Landsat 8 OLI and Machine Learning in Temperate Forests

Forests ◽

10.3390/f11010011 ◽

2019 ◽

Vol 11 (1) ◽

pp. 11

Author(s):

Pablito M. López-Serrano ◽

José Luis Cárdenas Domínguez ◽

José Javier Corral-Rivas ◽

Enrique Jiménez ◽

Carlos A. López-Sánchez ◽

...

Keyword(s):

Machine Learning ◽

Aboveground Biomass ◽

Goodness Of Fit ◽

Accurate Estimation ◽

Support Vector ◽

Landsat 8 ◽

Sensing Applications ◽

Learning Techniques ◽

Physical Variables ◽

Selection Of

An accurate estimation of forests’ aboveground biomass (AGB) is required because of its relevance to the carbon cycle, and because of its economic and ecological importance. The selection of appropriate variables from satellite information and physical variables is important for precise AGB prediction mapping. Because of the complex relationships for AGB prediction, non-parametric machine-learning techniques represent potentially useful techniques for AGB estimation, but their use and comparison in forest remote-sensing applications is still relatively limited. The objective of the present study was to evaluate the performance of automatic learning techniques, support vector regression (SVR) and random forest (RF), to predict the observed AGB (from 318 permanent sampling plots) from the Landsat 8 Landsat 8 Operational Land Imager (OLI) sensor, spectral indexes, texture indexes and physical variables the Sierra Madre Occidental in Mexico. The result showed that the best SVR model explained 80% of the total variance (root mean square error (RMSE) = 8.20 Mg ha−1). The variables that best predicted AGB, in order of importance, were the bands that belong to the region of red and near and middle infrared, and the average temperature. The results show that the SVR technique has a good potential for the estimation of the AGB and that the selection of the model hyperparameters has important implications for optimizing the goodness of fit.

Download Full-text

Landslide Susceptibility Mapping Using the Stacking Ensemble Machine Learning Method in Lushui, Southwest China

Applied Sciences ◽

10.3390/app10114016 ◽

2020 ◽

Vol 10 (11) ◽

pp. 4016 ◽

Cited By ~ 3

Author(s):

Xudong Hu ◽

Han Zhang ◽

Hongbo Mei ◽

Dunhui Xiao ◽

Yuanyuan Li ◽

...

Keyword(s):

Machine Learning ◽

Landslide Susceptibility ◽

Southwest China ◽

Susceptibility Mapping ◽

Landslide Susceptibility Mapping ◽

Support Vector ◽

Machine Learning Method ◽

Learning Method ◽

Statistical Measures ◽

Ensemble Machine Learning

Landslide susceptibility mapping is considered to be a prerequisite for landslide prevention and mitigation. However, delineating the spatial occurrence pattern of the landslide remains a challenge. This study investigates the potential application of the stacking ensemble learning technique for landslide susceptibility assessment. In particular, support vector machine (SVM), artificial neural network (ANN), logical regression (LR), and naive Bayes (NB) were selected as base learners for the stacking ensemble method. The resampling scheme and Pearson’s correlation analysis were jointly used to evaluate the importance level of these base learners. A total of 388 landslides and 12 conditioning factors in the Lushui area (Southwest China) were used as the dataset to develop landslide modeling. The landslides were randomly separated into two parts, with 70% used for model training and 30% used for model validation. The models’ performance was evaluated using the area under the receiver operating characteristic (ROC) curve (AUC) and statistical measures. The results showed that the stacking-based ensemble model achieved an improved predictive accuracy as compared to the single algorithms, while the SVM-ANN-NB-LR (SANL) model, the SVM-ANN-NB (SAN) model, and the ANN-NB-LR (ANL) models performed equally well, with AUC values of 0.931, 0.940, and 0.932, respectively, for validation stage. The correlation coefficient between the LR and SVM was the highest for all resampling rounds, with a value of 0.72 on average. This connotes that LR and SVM played an almost equal role when the ensemble of SANL was applied for landslide susceptibility analysis. Therefore, it is feasible to use the SAN model or the ANL model for the study area. The finding from this study suggests that the stacking ensemble machine learning method is promising for landslide susceptibility mapping in the Lushui area and is capable of targeting areas prone to landslides.

Download Full-text

Novel GIS Based Machine Learning Algorithms for Shallow Landslide Susceptibility Mapping

Sensors ◽

10.3390/s18113777 ◽

2018 ◽

Vol 18 (11) ◽

pp. 3777 ◽

Cited By ~ 61

Author(s):

Ataollah Shirzadi ◽

Karim Soliamani ◽

Mahmood Habibnejhad ◽

Ataollah Kavian ◽

Kamran Chapi ◽

...

Keyword(s):

Machine Learning ◽

Sample Size ◽

Prediction Accuracy ◽

Goodness Of Fit ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Landslide Susceptibility Mapping ◽

Sample Sizes ◽

Promising Alternative ◽

Statistical Measures

The main objective of this research was to introduce a novel machine learning algorithm of alternating decision tree (ADTree) based on the multiboost (MB), bagging (BA), rotation forest (RF) and random subspace (RS) ensemble algorithms under two scenarios of different sample sizes and raster resolutions for spatial prediction of shallow landslides around Bijar City, Kurdistan Province, Iran. The evaluation of modeling process was checked by some statistical measures and area under the receiver operating characteristic curve (AUROC). Results show that, for combination of sample sizes of 60%/40% and 70%/30% with a raster resolution of 10 m, the RS model, while, for 80%/20% and 90%/10% with a raster resolution of 20 m, the MB model obtained a high goodness-of-fit and prediction accuracy. The RS-ADTree and MB-ADTree ensemble models outperformed the ADTree model in two scenarios. Overall, MB-ADTree in sample size of 80%/20% with a resolution of 20 m (area under the curve (AUC) = 0.942) and sample size of 60%/40% with a resolution of 10 m (AUC = 0.845) had the highest and lowest prediction accuracy, respectively. The findings confirm that the newly proposed models are very promising alternative tools to assist planners and decision makers in the task of managing landslide prone areas.

Download Full-text

GIS-Based snow avalanche hazard mapping: Bayburt-Aşağı Dere catchment case

Journal of Environmental Biology ◽

10.22438/jeb/38/5(si)/gm-10 ◽

2017 ◽

Vol 38 (5(SI)) ◽

pp. 937-943 ◽

Cited By ~ 2

Author(s):

A. Aydin ◽

◽

R. Eker ◽

Keyword(s):

Snow Avalanche ◽

Hazard Mapping ◽

Avalanche Hazard

Download Full-text

Machine Learning Paradigm towards Content Based Image Retrieval on High Resolution Satellite Images

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b1104.1292s219 ◽

2019 ◽

Vol 9 (2S2) ◽

pp. 999-1005

Keyword(s):

Machine Learning ◽

High Resolution ◽

Image Retrieval ◽

Satellite Images ◽

Satellite Image ◽

Content Based Image Retrieval ◽

Support Vector ◽

Ensemble Model ◽

Learning Paradigm ◽

High Resolution Satellite Images

In the current era, content based image retrieval based on pattern recognition and classification using machine learning paradigm is an innovative way. In order to retrieve high resolution satellite images Support Vector Machine (SVM) a machine learning paradigm is helpful for learning process and for pattern recognition and classification; ensemble methods give better machine learning results. In this paper, SVM based on random subspace and boosting ensemble learning is proposed for very high resolution satellite image retrieval. The learned SVM ensemble model is used to identify the images that most similar informative for active learning. A bias-weighting system is developed to direct the ensemble model to pay more attention on the positive examples than the negative ones. The UCMerced land use satellite image dataset is used for experimental work. Accuracy and error rate are found to be precise. The tentative effects illustrate that the proposed model derived enhanced retrieval accurateness at the optimum level as well as significantly more effective than existing approaches. The proposed method can diminish the gap dimensionality and conquer the difficulty. The comparisons are evaluated by using precision and recall measurements. Comparative analysis observed that the retrieval time for a particular image have been reduced and the precision is increased. The primary aim of this paper is to represent the significance of ensemble learning with support vector machine in efficient retrieval of image.

Download Full-text

Comparative analysis of machine learning approaches to analyze and predict the COVID-19 outbreak

PeerJ Computer Science ◽

10.7717/peerj-cs.746 ◽

2021 ◽

Vol 7 ◽

pp. e746

Author(s):

Muhammad Naeem ◽

Jian Yu ◽

Muhammad Aamir ◽

Sajjad Ahmad Khan ◽

Olayinka Adeleye ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Comparative Analysis ◽

Prediction Models ◽

Mean Absolute Percentage Error ◽

Percentage Error ◽

Support Vector ◽

Absolute Percentage Error ◽

Statistical Measures ◽

The Impact

Background Forecasting the time of forthcoming pandemic reduces the impact of diseases by taking precautionary steps such as public health messaging and raising the consciousness of doctors. With the continuous and rapid increase in the cumulative incidence of COVID-19, statistical and outbreak prediction models including various machine learning (ML) models are being used by the research community to track and predict the trend of the epidemic, and also in developing appropriate strategies to combat and manage its spread. Methods In this paper, we present a comparative analysis of various ML approaches including Support Vector Machine, Random Forest, K-Nearest Neighbor and Artificial Neural Network in predicting the COVID-19 outbreak in the epidemiological domain. We first apply the autoregressive distributed lag (ARDL) method to identify and model the short and long-run relationships of the time-series COVID-19 datasets. That is, we determine the lags between a response variable and its respective explanatory time series variables as independent variables. Then, the resulting significant variables concerning their lags are used in the regression model selected by the ARDL for predicting and forecasting the trend of the epidemic. Results Statistical measures—Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Symmetric Mean Absolute Percentage Error (SMAPE)—are used for model accuracy. The values of MAPE for the best-selected models for confirmed, recovered and deaths cases are 0.003, 0.006 and 0.115, respectively, which falls under the category of highly accurate forecasts. In addition, we computed 15 days ahead forecast for the daily deaths, recovered, and confirm patients and the cases fluctuated across time in all aspects. Besides, the results reveal the advantages of ML algorithms for supporting the decision-making of evolving short-term policies.

Download Full-text

Gully Head-Cut Distribution Modeling Using Machine Learning Methods—A Case Study of N.W. Iran

Water ◽

10.3390/w12010016 ◽

2019 ◽

Vol 12 (1) ◽

pp. 16 ◽

Cited By ~ 14

Author(s):

Alireza Arabameri ◽

Wei Chen ◽

Thomas Blaschke ◽

John P. Tiefenbacher ◽

Biswajeet Pradhan ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Goodness Of Fit ◽

Gully Erosion ◽

Ensemble Model ◽

Conditioning Factors ◽

Alternating Decision Tree ◽

Gully Head ◽

Semi Arid

To more effectively prevent and manage the scourge of gully erosion in arid and semi-arid regions, we present a novel-ensemble intelligence approach—bagging-based alternating decision-tree classifier (bagging-ADTree)—and use it to model a landscape’s susceptibility to gully erosion based on 18 gully-erosion conditioning factors. The model’s goodness-of-fit and prediction performance are compared to three other machine learning algorithms (single alternating decision tree, rotational-forest-based alternating decision tree (RF-ADTree), and benchmark logistic regression). To achieve this, a gully-erosion inventory was created for the study area, the Chah Mousi watershed, Iran by combining archival records containing reports of gully erosion, remotely sensed data from Google Earth, and geolocated sites of gully head-cuts gathered in a field survey. A total of 119 gully head-cuts were identified and mapped. To train the models’ analysis and prediction capabilities, 83 head-cuts (70% of the total) and the corresponding measures of the conditioning factors were input into each model. The results from the models were validated using the data pertaining to the remaining 36 gully locations (30%). Next, the frequency ratio is used to identify which conditioning-factor classes have the strongest correlation with gully erosion. Using random-forest modeling, the relative importance of each of the conditioning factors was determined. Based on the random-forest results, the top eight factors in this study area are distance-to-road, drainage density, distance-to-stream, LU/LC, annual precipitation, topographic wetness index, NDVI, and elevation. Finally, based on goodness-of-fit and AUROC of the success rate curve (SRC) and prediction rate curve (PRC), the results indicate that the bagging-ADTree ensemble model had the best performance, with SRC (0.964) and PRC (0.978). RF-ADTree (SRC = 0.952 and PRC = 0.971), ADTree (SRC = 0.926 and PRC = 0.965), and LR (SRC = 0.867 and PRC = 0.870) were the subsequent best performers. The results also indicate that bagging and RF, as meta-classifiers, improved the performance of the ADTree model as a base classifier. The bagging-ADTree model’s results indicate that 24.28% of the study area is classified as having high and very high susceptibility to gully erosion. The new ensemble model accurately identified the areas that are susceptible to gully erosion based on the past patterns of formation, but it also provides highly accurate predictions of future gully development. The novel ensemble method introduced in this research is recommended for use to evaluate the patterns of gullying in arid and semi-arid environments and can effectively identify the most salient conditioning factors that promote the development and expansion of gullies in erosion-susceptible environments.

Download Full-text