scholarly journals Assessment of Landslide-Prone Areas and Their Zonation Using Logistic Regression, LogitBoost, and NaïveBayes Machine-Learning Algorithms

2018 ◽  
Vol 10 (10) ◽  
pp. 3697 ◽  
Author(s):  
Hamid Pourghasemi ◽  
Amiya Gayen ◽  
Sungjae Park ◽  
Chang-Wook Lee ◽  
Saro Lee

The occurrence of landslide in the hilly region of South Korea is a matter of serious concern. This study tries to produce landslide susceptibility maps for Jumunjin Country in South Korea. Three machine learning algorithms, namely Logistic Regression (LR), LogitBoost (LB), and NaïveBayes (NB) are used, and their final model outcomes are compared to each other. Firstly, a landslide inventory map and the associated input data layers of the landslide conditioning factors were developed based on field verification, historical records, and high-resolution remote-sensing data in the geographic information system (GIS) environment. Seventeen landslide conditioning factors were prepared, including aspect, slope, altitude, maximum curvature, profile curvature, topographic wetness index (TWI), topographic positioning index (TPI), distance from fault, convexity, forest type, forest diameter, forest density, land use/land cover, lithology, soil, flow accumulation, and mid slope position. The result showed that the area under the curve (AUC) values of LR, LB, and NB models were 84.2%, 70.7%, and 85.2%, respectively. The results revealed that the LR and LB models produced reasonable accuracy than respect to NB model in landslide susceptibility assessment. The final susceptibility maps would be useful for preliminary land-use planning and hazard mitigation purpose.

Sensors ◽  
2019 ◽  
Vol 19 (18) ◽  
pp. 3940 ◽  
Author(s):  
Sevgen ◽  
Kocaman ◽  
Nefeslioglu ◽  
Gokceoglu

Prediction of possible landslide areas is the first stage of landslide hazard mitigation efforts and is also crucial for suitable site selection. Several statistical and machine learning methodologies have been applied for the production of landslide susceptibility maps. However, the performance assessment of such methods have conventionally been carried out by utilizing existing landslide inventories. The purpose of this study is to investigate the performances of landslide susceptibility maps produced with three different machine learning algorithms, i.e., random forest, artificial neural network, and logistic regression, in a recently constructed and activated dam reservoir and assess the external quality of each map by using pre- and post-event photogrammetric datasets. The methodology introduced here was applied using digital surface models generated from aerial photogrammetric flight data acquired before and after the dam construction. Aerial photogrammetric images acquired in 2012 and 2018 (after the dam was filled) were used to produce digital terrain models and orthophotos. The 2012 dataset was used for producing the landslide susceptibility maps and the results were evaluated by comparing the Euclidian distances between the two surface models. The results show that the random forest method outperforms the other two for predicting the future landslides.


Author(s):  
Gezahegn Weldu Woldemariam ◽  
Degefie Tibebe ◽  
Tesfamariam Engida Mengesha ◽  
Tadele Bedo Gelete

2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Matthijs Blankers ◽  
Louk F. M. van der Post ◽  
Jack J. M. Dekker

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.


2019 ◽  
Author(s):  
Matthijs Blankers ◽  
Louk F. M. van der Post ◽  
Jack J. M. Dekker

Abstract Background: It is difficult to accurately predict whether a patient on the verge of a potential psychiatric crisis will need to be hospitalized. Machine learning may be helpful to improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate and compare the accuracy of ten machine learning algorithms including the commonly used generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact, and explore the most important predictor variables of hospitalization. Methods: Data from 2,084 patients with at least one reported psychiatric crisis care contact included in the longitudinal Amsterdam Study of Acute Psychiatry were used. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared. We also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis. Target variable for the prediction models was whether or not the patient was hospitalized in the 12 months following inclusion in the study. The 39 predictor variables were related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts. Results: We found Gradient Boosting to perform the best (AUC=0.774) and K-Nearest Neighbors performing the least (AUC=0.702). The performance of GLM/logistic regression (AUC=0.76) was above average among the tested algorithms. Gradient Boosting outperformed GLM/logistic regression and K-Nearest Neighbors, and GLM outperformed K-Nearest Neighbors in a Net Reclassification Improvement analysis, although the differences between Gradient Boosting and GLM/logistic regression were small. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions: Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was modest. Future studies may consider to combine multiple algorithms in an ensemble model for optimal performance and to mitigate the risk of choosing suboptimal performing algorithms.


2021 ◽  
Vol 9 ◽  
Author(s):  
Huanhuan Zhao ◽  
Xiaoyu Zhang ◽  
Yang Xu ◽  
Lisheng Gao ◽  
Zuchang Ma ◽  
...  

Hypertension is a widespread chronic disease. Risk prediction of hypertension is an intervention that contributes to the early prevention and management of hypertension. The implementation of such intervention requires an effective and easy-to-implement hypertension risk prediction model. This study evaluated and compared the performance of four machine learning algorithms on predicting the risk of hypertension based on easy-to-collect risk factors. A dataset of 29,700 samples collected through a physical examination was used for model training and testing. Firstly, we identified easy-to-collect risk factors of hypertension, through univariate logistic regression analysis. Then, based on the selected features, 10-fold cross-validation was utilized to optimize four models, random forest (RF), CatBoost, MLP neural network and logistic regression (LR), to find the best hyper-parameters on the training set. Finally, the performance of models was evaluated by AUC, accuracy, sensitivity and specificity on the test set. The experimental results showed that the RF model outperformed the other three models, and achieved an AUC of 0.92, an accuracy of 0.82, a sensitivity of 0.83 and a specificity of 0.81. In addition, Body Mass Index (BMI), age, family history and waist circumference (WC) are the four primary risk factors of hypertension. These findings reveal that it is feasible to use machine learning algorithms, especially RF, to predict hypertension risk without clinical or genetic data. The technique can provide a non-invasive and economical way for the prevention and management of hypertension in a large population.


Sign in / Sign up

Export Citation Format

Share Document