Landslide Susceptibility Prediction Based on Remote Sensing Images and GIS: Comparisons of Supervised and Unsupervised Machine Learning Models

Zhilu Chang; Zhen Du; Fan Zhang; Faming Huang; Jiawu Chen; Wenbin Li; Zizheng Guo

doi:10.3390/rs12030502

Landslide Susceptibility Prediction Based on Remote Sensing Images and GIS: Comparisons of Supervised and Unsupervised Machine Learning Models

Remote Sensing ◽

10.3390/rs12030502 ◽

2020 ◽

Vol 12 (3) ◽

pp. 502 ◽

Cited By ~ 18

Author(s):

Zhilu Chang ◽

Zhen Du ◽

Fan Zhang ◽

Faming Huang ◽

Jiawu Chen ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Landslide Susceptibility ◽

Prediction Accuracy ◽

Aerial Images ◽

Supervised Machine Learning ◽

Support Vector ◽

Unsupervised Machine Learning ◽

Advantages And Disadvantages ◽

Interaction Detection

Landslide susceptibility prediction (LSP) has been widely and effectively implemented by machine learning (ML) models based on remote sensing (RS) images and Geographic Information System (GIS). However, comparisons of the applications of ML models for LSP from the perspectives of supervised machine learning (SML) and unsupervised machine learning (USML) have not been explored. Hence, this study aims to compare the LSP performance of these SML and USML models, thus further to explore the advantages and disadvantages of these ML models and to realize a more accurate and reliable LSP result. Two representative SML models (support vector machine (SVM) and CHi-squared Automatic Interaction Detection (CHAID)) and two representative USML models (K-means and Kohonen models) are respectively used to scientifically predict the landslide susceptibility indexes, and then these prediction results are discussed. Ningdu County with 446 recorded landslides obtained through field investigations is introduced as case study. A total of 12 conditioning factors are obtained through procession of Landsat TM 8 images and high-resolution aerial images, topographical and hydrological spatial analysis of Digital Elevation Modeling in GIS software, and government reports. The area value under the curve of receiver operating features (AUC) is applied for evaluating the prediction accuracy of SML models, and the frequency ratio (FR) accuracy is then introduced to compare the remarkable prediction performance differences between SML and USML models. Overall, the receiver operation curve (ROC) results show that the AUC of the SVM is 0.892 and is slightly greater than the AUC of the CHAID model (0.872). The FR accuracy results show that the SVM model has the highest accuracy for LSP (77.80%), followed by the CHAID model (74.50%), the Kohonen model (72.8%) and the K-means model (69.7%), which indicates that the SML models can reach considerably better prediction capability than the USML models. It can be concluded that selecting recorded landslides as prior knowledge to train and test the LSP models is the key reason for the higher prediction accuracy of the SML models, while the lack of a priori knowledge and target guidance is an important reason for the low LSP accuracy of the USML models. Nevertheless, the USML models can also be used to implement LSP due to their advantages of efficient modeling processes, dimensionality reduction and strong scalability.

Download Full-text

Improvement of Short-Term BIPV Power Predictions Using Feature Engineering and a Recurrent Neural Network

Energies ◽

10.3390/en12173247 ◽

2019 ◽

Vol 12 (17) ◽

pp. 3247 ◽

Cited By ~ 1

Author(s):

Dongkyu Lee ◽

Jinhwa Jeong ◽

Sung Hoon Yoon ◽

Young Tae Chae

Keyword(s):

Neural Network ◽

Machine Learning ◽

Recurrent Neural Network ◽

Power Output ◽

Prediction Accuracy ◽

Support Vector ◽

Feature Engineering ◽

Short Term ◽

Interaction Detection ◽

Photovoltaic Power

The time resolution and prediction accuracy of the power generated by building-integrated photovoltaics are important for managing electricity demand and formulating a strategy to trade power with the grid. This study presents a novel approach to improve short-term hourly photovoltaic power output predictions using feature engineering and machine learning. Feature selection measured the importance score of input features by using a model-based variable importance. It verified that the normative sky index in the weather forecasted data had the least importance as a predictor for hourly prediction of photovoltaic power output. Six different machine-learning algorithms were assessed to select an appropriate model for the hourly power output prediction with onsite weather forecast data. The recurrent neural network outperformed five other models, including artificial neural networks, support vector machines, classification and regression trees, chi-square automatic interaction detection, and random forests, in terms of its ability to predict photovoltaic power output at an hourly and daily resolution for 64 tested days. Feature engineering was then used to apply dropout observation to the normative sky index from the training and prediction process, which improved the hourly prediction performance. In particular, the prediction accuracy for overcast days improved by 20% compared to the original weather dataset used without dropout observation. The results show that feature engineering effectively improves the short-term predictions of photovoltaic power output in buildings with a simple weather forecasting service.

Download Full-text

In-situ identification of material batches using machine learning for machining operations

Journal of Intelligent Manufacturing ◽

10.1007/s10845-020-01718-3 ◽

2020 ◽

Author(s):

Benjamin Lutz ◽

Dominik Kisskalt ◽

Andreas Mayr ◽

Daniel Regulin ◽

Matteo Pantano ◽

...

Keyword(s):

Machine Learning ◽

Prediction Accuracy ◽

Supervised Machine Learning ◽

Support Vector ◽

Machine Learning Model ◽

Multiple Classification ◽

Subtractive Manufacturing ◽

Smart Service ◽

Machining Operations

AbstractIn subtractive manufacturing, differences in machinability among batches of the same material can be observed. Ignoring these deviations can potentially reduce product quality and increase manufacturing costs. To consider the influence of the material batch in process optimization models, the batch needs to be efficiently identified. Thus, a smart service is proposed for in-situ material batch identification. This service is driven by a supervised machine learning model, which analyzes the signals of the machine’s control, especially torque data, for batch classification. The proposed approach is validated by cutting experiments with five different batches of the same specified material at various cutting conditions. Using this data, multiple classification models are trained and optimized. It is shown that the investigated batches can be correctly identified with close to 90% prediction accuracy using machine learning. Out of all the investigated algorithms, the best results are achieved using a Support Vector Machine with 89.0% prediction accuracy for individual batches and 98.9% while combining batches of similar machinability.

Download Full-text

Study on the Estimation of Forest Volume Based on Multi-Source Data

Sensors ◽

10.3390/s21237796 ◽

2021 ◽

Vol 21 (23) ◽

pp. 7796

Author(s):

Tao Hu ◽

Yuman Sun ◽

Weiwei Jia ◽

Dandan Li ◽

Maosheng Zou ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Remote Sensing ◽

Artificial Neural Network ◽

Random Forest ◽

Hybrid Model ◽

Prediction Accuracy ◽

Volume Estimation ◽

Support Vector ◽

Estimation Models

We performed a comparative analysis of the prediction accuracy of machine learning methods and ordinary Kriging (OK) hybrid methods for forest volume models based on multi-source remote sensing data combined with ground survey data. Taking Larix olgensis, Pinus koraiensis, and Pinus sylvestris plantations in Mengjiagang forest farms as the research object, based on the Chinese Academy of Forestry LiDAR, charge-coupled device, and hyperspectral (CAF-LiTCHy) integrated system, we extracted the visible vegetation index, texture features, terrain factors, and point cloud feature variables, respectively. Random forest (RF), support vector regression (SVR), and an artificial neural network (ANN) were used to estimate forest volume. In the small-scale space, the estimation of sample plot volume is influenced by the surrounding environment as well as the neighboring observed data. Based on the residuals of these three machine learning models, OK interpolation was applied to construct new hybrid forest volume estimation models called random forest Kriging (RFK), support vector machines for regression Kriging (SVRK), and artificial neural network Kriging (ANNK). The six estimation models of forest volume were tested using the leave-one-out (Loo) cross-validation method. The prediction accuracies of these six models are better, with RLoo2 values above 0.6, and the prediction accuracy values of the hybrid models are all improved to different extents. Among the six models, the RFK hybrid model had the best prediction effect, with an RLoo2 reaching 0.915. Therefore, the machine learning method based on multi-source remote sensing factors is useful for forest volume estimation; in particular, the hybrid model constructed by combining machine learning and the OK method greatly improved the accuracy of forest volume estimation, which, thus, provides a fast and effective method for the remote sensing inversion estimation of forest volume and facilitates the management of forest resources.

Download Full-text

Uncertainties of Collapse Susceptibility Prediction Based on Remote Sensing and GIS: Effects of Different Machine Learning Models

Frontiers in Earth Science ◽

10.3389/feart.2021.731058 ◽

2021 ◽

Vol 9 ◽

Author(s):

Wenbin Li ◽

Yu Shi ◽

Faming Huang ◽

Haoyuan Hong ◽

Guquan Song

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Prediction Accuracy ◽

Mean Value ◽

Remote Sensing And Gis ◽

Support Vector ◽

Learning Models ◽

Chi Square ◽

Mean Values ◽

Machine Learning Models

For the issue of collapse susceptibility prediction (CSP), minimal attention has been paid to explore the uncertainty characteristics of different machine learning models predicting collapse susceptibility. In this study, six kinds of typical machine learning methods, namely, logistic regression (LR), radial basis function neural network (RBF), multilayer perceptron (MLP), support vector machine (SVM), chi-square automatic interactive detection decision tree (CHAID), and random forest (RF) models, are constructed to do CSP. In this regard, An’yuan County in China, with a total of 108 collapses and 11 related environmental factors acquired through remote sensing and GIS technologies, is selected as a case study. The spatial dataset is first constructed, and then these machine learning models are used to implement CSP. Finally, the uncertainty characteristics of the CSP results are explored according to the accuracies, mean values, and standard deviations of the collapse susceptibility indexes (CSIs) and the Kendall synergy coefficient test. In addition, Huichang County, China, is used as another study case to avoid the uncertainty of different study areas. Results show that 1) overall, all six kinds of machine learning models reasonably and accurately predict the collapse susceptibility in An’yuan County; 2) the RF model has the highest prediction accuracy, followed by the CHAID, SVM, MLP, RBF, and LR models; and 3) the CSP results of these models are significantly different, with the mean value (0.2718) and average rank (2.72) of RF being smaller than those of the other five models, followed by the CHAID (0.3210 and 3.29), SVM (0.3268 and 3.48), MLP (0.3354 and 3.64), RBF (0.3449 and 3.81), and LR (0.3496 and 4.06), and with a Kendall synergy coefficient value of 0.062. Conclusively, it is necessary to adopt a series of different machine learning models to predict collapse susceptibility for cross-validation and comparison. Furthermore, the RF model has the highest prediction accuracy and the lowest uncertainty of the CSP results of the machine learning models.

Download Full-text

Landslide Susceptibility Prediction Considering Regional Soil Erosion Based on Machine-Learning Models

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9060377 ◽

2020 ◽

Vol 9 (6) ◽

pp. 377 ◽

Cited By ~ 3

Author(s):

Faming Huang ◽

Jiawu Chen ◽

Zhen Du ◽

Chi Yao ◽

Jinsong Huang ◽

...

Keyword(s):

Machine Learning ◽

Soil Erosion ◽

Landslide Susceptibility ◽

Prediction Accuracy ◽

Prediction Models ◽

Research Area ◽

Predisposing Factors ◽

Support Vector ◽

Predisposing Factor ◽

Operating Feature

Soil erosion (SE) provides slide mass sources for landslide formation, and reflects long-term rainfall erosion destruction of landslides. Therefore, it is possible to obtain more reliable landslide susceptibility prediction results by introducing SE as a geology and hydrology-related predisposing factor. The Ningdu County of China is taken as a research area. Firstly, 446 landslides are obtained through government disaster survey reports. Secondly, the SE amount in Ningdu County is calculated and nine other conventional predisposing factors are obtained under both 30 m and 60 m grid resolutions to determine the effects of SE on landslide susceptibility prediction. Thirdly, four types of machine-learning predictors with 30 m and 60 m grid resolutions—C5.0 decision tree (C5.0 DT), logistic regression (LR), multilayer perceptron (MLP) and support vector machine (SVM)—are applied to construct the landslide susceptibility prediction models considering the SE factor as SE-C5.0 DT, SE-LR, SE-MLP and SE-SVM models; C5.0 DT, LR, MLP and SVM models with no SE are also used for comparisons. Finally, the area under receiver operating feature curve is used to verify the prediction accuracy of these models, and the relative importance of all the 10 predisposing factors is ranked. The results indicate that: (1) SE factor plays the most important role in landslide susceptibility prediction among all 10 predisposing factors under both 30 m and 60 m resolutions; (2) the SE-based models have more accurate landslide susceptibility prediction than the single models with no SE factor; (3) all the models with 30 m resolutions have higher landslide susceptibility prediction accuracy than those with 60 m resolutions; and (4) the C5.0 DT and SVM models show higher landslide susceptibility prediction performance than the MLP and LR models.

Download Full-text

Improving Spatial Agreement in Machine Learning-Based Landslide Susceptibility Mapping

Remote Sensing ◽

10.3390/rs12203347 ◽

2020 ◽

Vol 12 (20) ◽

pp. 3347 ◽

Cited By ~ 2

Author(s):

Mohammed Sarfaraz Gani Adnan ◽

Md Salman Rahman ◽

Nahian Ahmed ◽

Bayes Ahmed ◽

Md. Fazleh Rabbi ◽

...

Keyword(s):

Machine Learning ◽

Landslide Susceptibility ◽

Prediction Accuracy ◽

Correlation Coefficients ◽

Machine Learning Algorithms ◽

Landslide Susceptibility Mapping ◽

Natural Phenomenon ◽

Support Vector ◽

Susceptibility Maps ◽

Landslide Susceptibility Maps

Despite yielding considerable degrees of accuracy in landslide predictions, the outcomes of different landslide susceptibility models are prone to spatial disagreement; and therefore, uncertainties. Uncertainties in the results of various landslide susceptibility models create challenges in selecting the most suitable method to manage this complex natural phenomenon. This study aimed to propose an approach to reduce uncertainties in landslide prediction, diagnosing spatial agreement in machine learning-based landslide susceptibility maps. It first developed landslide susceptibility maps of Cox’s Bazar district of Bangladesh, applying four machine learning algorithms: K-Nearest Neighbor (KNN), Multi-Layer Perceptron (MLP), Random Forest (RF), and Support Vector Machine (SVM), featuring hyperparameter optimization of 12 landslide conditioning factors. The results of all the four models yielded very high prediction accuracy, with the area under the curve (AUC) values range between 0.93 to 0.96. The assessment of spatial agreement of landslide predictions showed that the pixel-wise correlation coefficients of landslide probability between various models range from 0.69 to 0.85, indicating the uncertainty in predicted landslides by various models, despite their considerable prediction accuracy. The uncertainty was addressed by establishing a Logistic Regression (LR) model, incorporating the binary landslide inventory data as the dependent variable and the results of the four landslide susceptibility models as independent variables. The outcomes indicated that the RF model had the highest influence in predicting the observed landslide locations, followed by the MLP, SVM, and KNN models. Finally, a combined landslide susceptibility map was developed by integrating the results of the four machine learning-based landslide predictions. The combined map resulted in better spatial agreement (correlation coefficients range between 0.88 and 0.92) and greater prediction accuracy (0.97) compared to the individual models. The modelling approach followed in this study would be useful in minimizing uncertainties of various methods and improving landslide predictions.

Download Full-text

Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

10.26434/chemrxiv.5513581.v1 ◽

2017 ◽

Author(s):

Sabrina Jaeger ◽

Simone Fulle ◽

Samo Turk

Keyword(s):

Machine Learning ◽

Language Processing ◽

Supervised Machine Learning ◽

Learning Approach ◽

Learning Approaches ◽

Unsupervised Machine Learning ◽

Feature Representations ◽

Machine Learning Approach ◽

The Individual ◽

Vector Representations

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.

Download Full-text

Predictive Modelling of Employee Turnover in Indian IT Industry Using Machine Learning Techniques

Vision The Journal of Business Perspective ◽

10.1177/0972262918821221 ◽

2019 ◽

Vol 23 (1) ◽

pp. 12-21 ◽

Cited By ~ 2

Author(s):

Shikha N. Khera ◽

Divya

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Confusion Matrix ◽

Predictive Modelling ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

It Industry ◽

Knowledge Based ◽

Employee Attrition

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.

Download Full-text

Monitoring the Foliar Nutrients Status of Mango Using Spectroscopy-Based Spectral Indices and PLSR-Combined Machine Learning Models

Remote Sensing ◽

10.3390/rs13040641 ◽

2021 ◽

Vol 13 (4) ◽

pp. 641

Author(s):

Gopal Ramdas Mahajan ◽

Bappa Das ◽

Dayesh Murgaokar ◽

Ittai Herrmann ◽

Katja Berger ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Partial Least Square ◽

Least Square ◽

Partial Least Square Regression ◽

Support Vector ◽

Spectral Indices ◽

Learning Models ◽

Leaf Nutrients ◽

Machine Learning Models

Conventional methods of plant nutrient estimation for nutrient management need a huge number of leaf or tissue samples and extensive chemical analysis, which is time-consuming and expensive. Remote sensing is a viable tool to estimate the plant’s nutritional status to determine the appropriate amounts of fertilizer inputs. The aim of the study was to use remote sensing to characterize the foliar nutrient status of mango through the development of spectral indices, multivariate analysis, chemometrics, and machine learning modeling of the spectral data. A spectral database within the 350–1050 nm wavelength range of the leaf samples and leaf nutrients were analyzed for the development of spectral indices and multivariate model development. The normalized difference and ratio spectral indices and multivariate models–partial least square regression (PLSR), principal component regression, and support vector regression (SVR) were ineffective in predicting any of the leaf nutrients. An approach of using PLSR-combined machine learning models was found to be the best to predict most of the nutrients. Based on the independent validation performance and summed ranks, the best performing models were cubist (R2 ≥ 0.91, the ratio of performance to deviation (RPD) ≥ 3.3, and the ratio of performance to interquartile distance (RPIQ) ≥ 3.71) for nitrogen, phosphorus, potassium, and zinc, SVR (R2 ≥ 0.88, RPD ≥ 2.73, RPIQ ≥ 3.31) for calcium, iron, copper, boron, and elastic net (R2 ≥ 0.95, RPD ≥ 4.47, RPIQ ≥ 6.11) for magnesium and sulfur. The results of the study revealed the potential of using hyperspectral remote sensing data for non-destructive estimation of mango leaf macro- and micro-nutrients. The developed approach is suggested to be employed within operational retrieval workflows for precision management of mango orchard nutrients.

Download Full-text

Machine Learning Approach for Predicting Lane-Change Maneuvers using the SHRP2 Naturalistic Driving Study Data

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211003581 ◽

2021 ◽

pp. 036119812110035

Author(s):

Anik Das ◽

Mohamed M. Ahmed

Keyword(s):

Machine Learning ◽

Prediction Accuracy ◽

Machine Learning Algorithms ◽

Support Vector ◽

Lane Change ◽

Adaptive Boosting ◽

Extreme Gradient Boosting ◽

Naturalistic Driving Study ◽

Naturalistic Driving ◽

Change Prediction

Accurate lane-change prediction information in real time is essential to safely operate Autonomous Vehicles (AVs) on the roadways, especially at the early stage of AVs deployment, where there will be an interaction between AVs and human-driven vehicles. This study proposed reliable lane-change prediction models considering features from vehicle kinematics, machine vision, driver, and roadway geometric characteristics using the trajectory-level SHRP2 Naturalistic Driving Study and Roadway Information Database. Several machine learning algorithms were trained, validated, tested, and comparatively analyzed including, Classification And Regression Trees (CART), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), Support Vector Machine (SVM), K Nearest Neighbor (KNN), and Naïve Bayes (NB) based on six different sets of features. In each feature set, relevant features were extracted through a wrapper-based algorithm named Boruta. The results showed that the XGBoost model outperformed all other models in relation to its highest overall prediction accuracy (97%) and F1-score (95.5%) considering all features. However, the highest overall prediction accuracy of 97.3% and F1-score of 95.9% were observed in the XGBoost model based on vehicle kinematics features. Moreover, it was found that XGBoost was the only model that achieved a reliable and balanced prediction performance across all six feature sets. Furthermore, a simplified XGBoost model was developed for each feature set considering the practical implementation of the model. The proposed prediction model could help in trajectory planning for AVs and could be used to develop more reliable advanced driver assistance systems (ADAS) in a cooperative connected and automated vehicle environment.

Download Full-text