Landslide Susceptibility Prediction Considering Regional Soil Erosion Based on Machine-Learning Models

Faming Huang; Jiawu Chen; Zhen Du; Chi Yao; Jinsong Huang; Qinghui Jiang; Zhilu Chang; Shu Li

doi:10.3390/ijgi9060377

Landslide Susceptibility Prediction Considering Regional Soil Erosion Based on Machine-Learning Models

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9060377 ◽

2020 ◽

Vol 9 (6) ◽

pp. 377 ◽

Cited By ~ 3

Author(s):

Faming Huang ◽

Jiawu Chen ◽

Zhen Du ◽

Chi Yao ◽

Jinsong Huang ◽

...

Keyword(s):

Machine Learning ◽

Soil Erosion ◽

Landslide Susceptibility ◽

Prediction Accuracy ◽

Prediction Models ◽

Research Area ◽

Predisposing Factors ◽

Support Vector ◽

Predisposing Factor ◽

Operating Feature

Soil erosion (SE) provides slide mass sources for landslide formation, and reflects long-term rainfall erosion destruction of landslides. Therefore, it is possible to obtain more reliable landslide susceptibility prediction results by introducing SE as a geology and hydrology-related predisposing factor. The Ningdu County of China is taken as a research area. Firstly, 446 landslides are obtained through government disaster survey reports. Secondly, the SE amount in Ningdu County is calculated and nine other conventional predisposing factors are obtained under both 30 m and 60 m grid resolutions to determine the effects of SE on landslide susceptibility prediction. Thirdly, four types of machine-learning predictors with 30 m and 60 m grid resolutions—C5.0 decision tree (C5.0 DT), logistic regression (LR), multilayer perceptron (MLP) and support vector machine (SVM)—are applied to construct the landslide susceptibility prediction models considering the SE factor as SE-C5.0 DT, SE-LR, SE-MLP and SE-SVM models; C5.0 DT, LR, MLP and SVM models with no SE are also used for comparisons. Finally, the area under receiver operating feature curve is used to verify the prediction accuracy of these models, and the relative importance of all the 10 predisposing factors is ranked. The results indicate that: (1) SE factor plays the most important role in landslide susceptibility prediction among all 10 predisposing factors under both 30 m and 60 m resolutions; (2) the SE-based models have more accurate landslide susceptibility prediction than the single models with no SE factor; (3) all the models with 30 m resolutions have higher landslide susceptibility prediction accuracy than those with 60 m resolutions; and (4) the C5.0 DT and SVM models show higher landslide susceptibility prediction performance than the MLP and LR models.

Download Full-text

Predicting Learning Outcomes with MOOC Clickstreams

Education Sciences ◽

10.3390/educsci9020104 ◽

2019 ◽

Vol 9 (2) ◽

pp. 104 ◽

Cited By ~ 5

Author(s):

Chen-Hsiang Yu ◽

Jungpin Wu ◽

An-Chi Liu

Keyword(s):

Machine Learning ◽

Learning Outcomes ◽

Prediction Accuracy ◽

Nearest Neighbor ◽

Prediction Models ◽

Video Data ◽

Support Vector ◽

Completion Rates ◽

K Nearest Neighbor ◽

Learning Behaviors

Massive Open Online Courses (MOOCs) have gradually become a dominant trend in education. Since 2014, the Ministry of Education in Taiwan has been promoting MOOC programs, with successful results. The ability of students to work at their own pace, however, is associated with low MOOC completion rates and has recently become a focus. The development of a mechanism to effectively improve course completion rates continues to be of great interest to both teachers and researchers. This study established a series of learning behaviors using the video clickstream records of students, through a MOOC platform, to identify seven types of cognitive participation models of learners. We subsequently built practical machine learning models by using K-nearest neighbor (KNN), support vector machines (SVM), and artificial neural network (ANN) algorithms to predict students’ learning outcomes via their learning behaviors. The ANN machine learning method had the highest prediction accuracy. Based on the prediction results, we saw a correlation between video viewing behavior and learning outcomes. This could allow teachers to help students needing extra support successfully pass the course. To further improve our method, we classified the course videos based on their content. There were three video categories: theoretical, experimental, and analytic. Different prediction models were built for each of these three video types and their combinations. We performed the accuracy verification; our experimental results showed that we could use only theoretical and experimental video data, instead of all three types of data, to generate prediction models without significant differences in prediction accuracy. In addition to data reduction in model generation, this could help teachers evaluate the effectiveness of course videos.

Download Full-text

The Use of a Machine Learning Method to Predict the Real-Time Link Travel Time of Open-Pit Trucks

Mathematical Problems in Engineering ◽

10.1155/2018/4368045 ◽

2018 ◽

Vol 2018 ◽

pp. 1-14 ◽

Cited By ~ 1

Author(s):

Xiaoyu Sun ◽

Hang Zhang ◽

Fengliang Tian ◽

Lei Yang

Keyword(s):

Machine Learning ◽

Travel Time ◽

Prediction Accuracy ◽

Prediction Models ◽

Open Pit ◽

Support Vector ◽

Travel Time Prediction ◽

Open Pit Mines ◽

Optimal Dispatch ◽

Link Travel Time

Accurate truck travel time prediction (TTP) is one of the critical factors in the dynamic optimal dispatch of open-pit mines. This study divides the roads of open-pit mines into two types: fixed and temporary link roads. The experiment uses data obtained from Fushun West Open-pit Mine (FWOM) to train three types of machine learning (ML) prediction models based on k-nearest neighbors (kNN), support vector machine (SVM), and random forest (RF) algorithms for each link road. The results show that the TTP models based on SVM and RF are better than that based on kNN. The prediction accuracy calculated in this study is approximately 15.79% higher than that calculated by traditional methods. Meteorological features added to the TTP model improved the prediction accuracy by 5.13%. Moreover, this study uses the link rather than the route as the minimum TTP unit, and the former shows an increase in prediction accuracy of 11.82%.

Download Full-text

On the determinants and prediction of corporate financial distress in India

Managerial Finance ◽

10.1108/mf-06-2020-0332 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Sanjay Sehgal ◽

Ritesh Kumar Mishra ◽

Florent Deisting ◽

Rupali Vashisht

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Financial Distress ◽

Prediction Accuracy ◽

Prediction Models ◽

Accounting Information ◽

Support Vector ◽

Content Type ◽

Distress Prediction ◽

Practical Implications

PurposeThe main aim of the study is to identify some critical microeconomic determinants of financial distress and to design a parsimonious distress prediction model for an emerging economy like India. In doing so, the authors also attempt to compare the forecasting accuracy of alternative distress prediction techniques.Design/methodology/approachIn this study, the authors use two alternatives accounting information-based definitions of financial distress to construct a measure of financial distress. The authors then use the binomial logit model and two other popular machine learning–based models, namely artificial neural network and support vector machine, to compare the distress prediction accuracy rate of these alternative techniques for the Indian corporate sector.FindingsThe study’s empirical results suggest that five financial ratios, namely return on capital employed, cash flows to total liability, asset turnover ratio, fixed assets to total assets, debt to equity ratio and a measure of firm size (log total assets), play a highly significant role in distress prediction. The study’s findings suggest that machine learning-based models, namely support vector machine (SVM) and artificial neural network (ANN), are superior in terms of their prediction accuracy compared to the simple binomial logit model. Results also suggest that one-year-ahead forecasts are relatively better than the two-year-ahead forecasts.Practical implicationsThe findings of the study have some important practical implications for creditors, policymakers, regulators and other stakeholders. First, rather than monitoring and collecting information on a list of predictor variables, only six most important accounting ratios may be monitored to track the transition of a healthy firm into financial distress. Second, our six-factor model can be used to devise a sound early warning system for corporate financial distress. Three, machine learning–based distress prediction models have prediction accuracy superiority over the commonly used time series model in the available literature for distress prediction involving a binary dependent variable.Originality/valueThis study is one of the first comprehensive attempts to investigate and design a parsimonious distress prediction model for the emerging Indian economy which is currently facing high levels of corporate financial distress. Unlike the previous studies, the authors use two different accounting information-based measures of financial distress in order to identify an effective way of measuring financial distress. Some of the determinants of financial distress identified in this study are different from the popular distress prediction models used in the literature. Our distress prediction model can be useful for the other emerging markets for distress prediction.

Download Full-text

Landslide Susceptibility Prediction Based on Remote Sensing Images and GIS: Comparisons of Supervised and Unsupervised Machine Learning Models

Remote Sensing ◽

10.3390/rs12030502 ◽

2020 ◽

Vol 12 (3) ◽

pp. 502 ◽

Cited By ~ 18

Author(s):

Zhilu Chang ◽

Zhen Du ◽

Fan Zhang ◽

Faming Huang ◽

Jiawu Chen ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Landslide Susceptibility ◽

Prediction Accuracy ◽

Aerial Images ◽

Supervised Machine Learning ◽

Support Vector ◽

Unsupervised Machine Learning ◽

Advantages And Disadvantages ◽

Interaction Detection

Landslide susceptibility prediction (LSP) has been widely and effectively implemented by machine learning (ML) models based on remote sensing (RS) images and Geographic Information System (GIS). However, comparisons of the applications of ML models for LSP from the perspectives of supervised machine learning (SML) and unsupervised machine learning (USML) have not been explored. Hence, this study aims to compare the LSP performance of these SML and USML models, thus further to explore the advantages and disadvantages of these ML models and to realize a more accurate and reliable LSP result. Two representative SML models (support vector machine (SVM) and CHi-squared Automatic Interaction Detection (CHAID)) and two representative USML models (K-means and Kohonen models) are respectively used to scientifically predict the landslide susceptibility indexes, and then these prediction results are discussed. Ningdu County with 446 recorded landslides obtained through field investigations is introduced as case study. A total of 12 conditioning factors are obtained through procession of Landsat TM 8 images and high-resolution aerial images, topographical and hydrological spatial analysis of Digital Elevation Modeling in GIS software, and government reports. The area value under the curve of receiver operating features (AUC) is applied for evaluating the prediction accuracy of SML models, and the frequency ratio (FR) accuracy is then introduced to compare the remarkable prediction performance differences between SML and USML models. Overall, the receiver operation curve (ROC) results show that the AUC of the SVM is 0.892 and is slightly greater than the AUC of the CHAID model (0.872). The FR accuracy results show that the SVM model has the highest accuracy for LSP (77.80%), followed by the CHAID model (74.50%), the Kohonen model (72.8%) and the K-means model (69.7%), which indicates that the SML models can reach considerably better prediction capability than the USML models. It can be concluded that selecting recorded landslides as prior knowledge to train and test the LSP models is the key reason for the higher prediction accuracy of the SML models, while the lack of a priori knowledge and target guidance is an important reason for the low LSP accuracy of the USML models. Nevertheless, the USML models can also be used to implement LSP due to their advantages of efficient modeling processes, dimensionality reduction and strong scalability.

Download Full-text

Improving Spatial Agreement in Machine Learning-Based Landslide Susceptibility Mapping

Remote Sensing ◽

10.3390/rs12203347 ◽

2020 ◽

Vol 12 (20) ◽

pp. 3347 ◽

Cited By ~ 2

Author(s):

Mohammed Sarfaraz Gani Adnan ◽

Md Salman Rahman ◽

Nahian Ahmed ◽

Bayes Ahmed ◽

Md. Fazleh Rabbi ◽

...

Keyword(s):

Machine Learning ◽

Landslide Susceptibility ◽

Prediction Accuracy ◽

Correlation Coefficients ◽

Machine Learning Algorithms ◽

Landslide Susceptibility Mapping ◽

Natural Phenomenon ◽

Support Vector ◽

Susceptibility Maps ◽

Landslide Susceptibility Maps

Despite yielding considerable degrees of accuracy in landslide predictions, the outcomes of different landslide susceptibility models are prone to spatial disagreement; and therefore, uncertainties. Uncertainties in the results of various landslide susceptibility models create challenges in selecting the most suitable method to manage this complex natural phenomenon. This study aimed to propose an approach to reduce uncertainties in landslide prediction, diagnosing spatial agreement in machine learning-based landslide susceptibility maps. It first developed landslide susceptibility maps of Cox’s Bazar district of Bangladesh, applying four machine learning algorithms: K-Nearest Neighbor (KNN), Multi-Layer Perceptron (MLP), Random Forest (RF), and Support Vector Machine (SVM), featuring hyperparameter optimization of 12 landslide conditioning factors. The results of all the four models yielded very high prediction accuracy, with the area under the curve (AUC) values range between 0.93 to 0.96. The assessment of spatial agreement of landslide predictions showed that the pixel-wise correlation coefficients of landslide probability between various models range from 0.69 to 0.85, indicating the uncertainty in predicted landslides by various models, despite their considerable prediction accuracy. The uncertainty was addressed by establishing a Logistic Regression (LR) model, incorporating the binary landslide inventory data as the dependent variable and the results of the four landslide susceptibility models as independent variables. The outcomes indicated that the RF model had the highest influence in predicting the observed landslide locations, followed by the MLP, SVM, and KNN models. Finally, a combined landslide susceptibility map was developed by integrating the results of the four machine learning-based landslide predictions. The combined map resulted in better spatial agreement (correlation coefficients range between 0.88 and 0.92) and greater prediction accuracy (0.97) compared to the individual models. The modelling approach followed in this study would be useful in minimizing uncertainties of various methods and improving landslide predictions.

Download Full-text

Comparison of machine learning methods for prediction of osteoradionecrosis incidence in patients with head and neck cancer

British Journal of Radiology ◽

10.1259/bjr.20200026 ◽

2021 ◽

Vol 94 (1120) ◽

pp. 20200026

Author(s):

Laia Humbert-Vidan ◽

Vinod Patel ◽

Ilkay Oksuz ◽

Andrew Peter King ◽

Teresa Guerrero Urbano

Keyword(s):

Machine Learning ◽

Head And Neck Cancer ◽

Head And Neck ◽

Neck Cancer ◽

Prediction Accuracy ◽

Prediction Models ◽

Support Vector ◽

Ann Model ◽

Adaptive Boosting ◽

Significant Difference

Objectives: Mandible osteoradionecrosis (ORN) is one of the most severe toxicities in patients with head and neck cancer (HNC) undergoing radiotherapy (RT). The existing literature focuses on the correlation of mandible ORN and clinical and dosimetric factors. This study proposes the use of machine learning (ML) methods as prediction models for mandible ORN incidence. Methods: A total of 96 patients (ORN incidence ratio of 1:1) treated between 2011 and 2015 were selected from the local HNC toxicity database. Demographic, clinical and dosimetric data (based on the mandible dose–volume histogram) were considered as model variables. Prediction accuracy (measured using a stratified fivefold nested cross-validation), sensitivity, specificity, precision and negative predictive value were used to evaluate the prediction performance of a multivariate logistic regression (LR) model, a support vector machine (SVM) model, a random forest (RF) model, an adaptive boosting (AdaBoost) model and an artificial neural network (ANN) model. The different models were compared based on their prediction accuracy and using the McNemar’s hypothesis test. Results: The ANN model (77% accuracy), closely followed by the SVM (76%), AdaBoost (75%) and LR (75%) models, showed the highest overall prediction accuracy. The RF model (71%) showed the lowest prediction accuracy. However, based on the McNemar’s test applied to all model pair combinations, no statistically significant difference between the models was found. Conclusion: Based on our results, we encourage the use of ML-based prediction models for ORN incidence as has already been done for other HNC toxicity end points. Advances in knowledge: This research opens a new path towards personalised RT for HNC using ML to predict mandible ORN incidence.

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

Machine Learning Methods Applied to the Prediction of Pseudo-nitzschia spp. Blooms in the Galician Rias Baixas (NW Spain)

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10040199 ◽

2021 ◽

Vol 10 (4) ◽

pp. 199

Author(s):

Francisco M. Bellas Aláez ◽

Jesus M. Torres Palenzuela ◽

Evangelos Spyrakos ◽

Luis González Vilas

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Prediction Models ◽

Support Vector ◽

False Alarms ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Rías Baixas ◽

New Algorithms

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.

Download Full-text

Machine Learning Approach for Predicting Lane-Change Maneuvers using the SHRP2 Naturalistic Driving Study Data

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211003581 ◽

2021 ◽

pp. 036119812110035

Author(s):

Anik Das ◽

Mohamed M. Ahmed

Keyword(s):

Machine Learning ◽

Prediction Accuracy ◽

Machine Learning Algorithms ◽

Support Vector ◽

Lane Change ◽

Adaptive Boosting ◽

Extreme Gradient Boosting ◽

Naturalistic Driving Study ◽

Naturalistic Driving ◽

Change Prediction

Accurate lane-change prediction information in real time is essential to safely operate Autonomous Vehicles (AVs) on the roadways, especially at the early stage of AVs deployment, where there will be an interaction between AVs and human-driven vehicles. This study proposed reliable lane-change prediction models considering features from vehicle kinematics, machine vision, driver, and roadway geometric characteristics using the trajectory-level SHRP2 Naturalistic Driving Study and Roadway Information Database. Several machine learning algorithms were trained, validated, tested, and comparatively analyzed including, Classification And Regression Trees (CART), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), Support Vector Machine (SVM), K Nearest Neighbor (KNN), and Naïve Bayes (NB) based on six different sets of features. In each feature set, relevant features were extracted through a wrapper-based algorithm named Boruta. The results showed that the XGBoost model outperformed all other models in relation to its highest overall prediction accuracy (97%) and F1-score (95.5%) considering all features. However, the highest overall prediction accuracy of 97.3% and F1-score of 95.9% were observed in the XGBoost model based on vehicle kinematics features. Moreover, it was found that XGBoost was the only model that achieved a reliable and balanced prediction performance across all six feature sets. Furthermore, a simplified XGBoost model was developed for each feature set considering the practical implementation of the model. The proposed prediction model could help in trajectory planning for AVs and could be used to develop more reliable advanced driver assistance systems (ADAS) in a cooperative connected and automated vehicle environment.

Download Full-text

Development of Machine Learning Models for Prediction of Smoking Cessation Outcome

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18052584 ◽

2021 ◽

Vol 18 (5) ◽

pp. 2584

Author(s):

Cheng-Chien Lai ◽

Wei-Hsin Huang ◽

Betty Chia-Chen Chang ◽

Lee-Ching Hwang

Keyword(s):

Machine Learning ◽

Smoking Cessation ◽

Success Rate ◽

Prediction Models ◽

Smoking Status ◽

Medical Center ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Support Vector ◽

Smoking Cessation Outcome

Predictors for success in smoking cessation have been studied, but a prediction model capable of providing a success rate for each patient attempting to quit smoking is still lacking. The aim of this study is to develop prediction models using machine learning algorithms to predict the outcome of smoking cessation. Data was acquired from patients underwent smoking cessation program at one medical center in Northern Taiwan. A total of 4875 enrollments fulfilled our inclusion criteria. Models with artificial neural network (ANN), support vector machine (SVM), random forest (RF), logistic regression (LoR), k-nearest neighbor (KNN), classification and regression tree (CART), and naïve Bayes (NB) were trained to predict the final smoking status of the patients in a six-month period. Sensitivity, specificity, accuracy, and area under receiver operating characteristic (ROC) curve (AUC or ROC value) were used to determine the performance of the models. We adopted the ANN model which reached a slightly better performance, with a sensitivity of 0.704, a specificity of 0.567, an accuracy of 0.640, and an ROC value of 0.660 (95% confidence interval (CI): 0.617–0.702) for prediction in smoking cessation outcome. A predictive model for smoking cessation was constructed. The model could aid in providing the predicted success rate for all smokers. It also had the potential to achieve personalized and precision medicine for treatment of smoking cessation.

Download Full-text