Land Subsidence Susceptibility Mapping in South Korea Using Machine Learning Algorithms

Dieu Tien Bui; Himan Shahabi; Ataollah Shirzadi; Kamran Chapi; Biswajeet Pradhan; Wei Chen; Khabat Khosravi; Mahdi Panahi; Baharin Bin Ahmad; Lee Saro

doi:10.3390/s18082464

Land Subsidence Susceptibility Mapping in South Korea Using Machine Learning Algorithms

Sensors ◽

10.3390/s18082464 ◽

2018 ◽

Vol 18 (8) ◽

pp. 2464 ◽

Cited By ~ 64

Author(s):

Dieu Tien Bui ◽

Himan Shahabi ◽

Ataollah Shirzadi ◽

Kamran Chapi ◽

Biswajeet Pradhan ◽

...

Keyword(s):

Machine Learning ◽

South Korea ◽

Land Subsidence ◽

Slope Angle ◽

Machine Learning Algorithms ◽

The Other ◽

Training Dataset ◽

Validation Dataset ◽

Support Vector ◽

Susceptibility Map

In this study, land subsidence susceptibility was assessed for a study area in South Korea by using four machine learning models including Bayesian Logistic Regression (BLR), Support Vector Machine (SVM), Logistic Model Tree (LMT) and Alternate Decision Tree (ADTree). Eight conditioning factors were distinguished as the most important affecting factors on land subsidence of Jeong-am area, including slope angle, distance to drift, drift density, geology, distance to lineament, lineament density, land use and rock-mass rating (RMR) were applied to modelling. About 24 previously occurred land subsidence were surveyed and used as training dataset (70% of data) and validation dataset (30% of data) in the modelling process. Each studied model generated a land subsidence susceptibility map (LSSM). The maps were verified using several appropriate tools including statistical indices, the area under the receiver operating characteristic (AUROC) and success rate (SR) and prediction rate (PR) curves. The results of this study indicated that the BLR model produced LSSM with higher acceptable accuracy and reliability compared to the other applied models, even though the other models also had reasonable results.

Download Full-text

KDClassifier: Urinary Proteomic Spectra Analysis Based on Machine Learning for Classification of Kidney Diseases

10.1101/2020.12.01.20242198 ◽

2020 ◽

Author(s):

Wanjun Zhao ◽

Yong Zhang ◽

Xinming Li ◽

Yonghong Mao ◽

Changwei Wu ◽

...

Keyword(s):

Machine Learning ◽

Mass Spectrum ◽

Kidney Disease ◽

Kidney Diseases ◽

Training Dataset ◽

Validation Dataset ◽

Support Vector ◽

Urinary Proteomics ◽

Diagnosis Model

AbstractBackgroundBy extracting the spectrum features from urinary proteomics based on an advanced mass spectrometer and machine learning algorithms, more accurate reporting results can be achieved for disease classification. We attempted to establish a novel diagnosis model of kidney diseases by combining machine learning with an extreme gradient boosting (XGBoost) algorithm with complete mass spectrum information from the urinary proteomics.MethodsWe enrolled 134 patients (including those with IgA nephropathy, membranous nephropathy, and diabetic kidney disease) and 68 healthy participants as a control, and for training and validation of the diagnostic model, applied a total of 610,102 mass spectra from their urinary proteomics produced using high-resolution mass spectrometry. We divided the mass spectrum data into a training dataset (80%) and a validation dataset (20%). The training dataset was directly used to create a diagnosis model using XGBoost, random forest (RF), a support vector machine (SVM), and artificial neural networks (ANNs). The diagnostic accuracy was evaluated using a confusion matrix. We also constructed the receiver operating-characteristic, Lorenz, and gain curves to evaluate the diagnosis model.ResultsCompared with RF, the SVM, and ANNs, the modified XGBoost model, called a Kidney Disease Classifier (KDClassifier), showed the best performance. The accuracy of the diagnostic XGBoost model was 96.03% (CI = 95.17%-96.77%; Kapa = 0.943; McNemar’s Test, P value = 0.00027). The area under the curve of the XGBoost model was 0.952 (CI = 0.9307-0.9733). The Kolmogorov-Smirnov (KS) value of the Lorenz curve was 0.8514. The Lorenz and gain curves showed the strong robustness of the developed model.ConclusionsThis study presents the first XGBoost diagnosis model, i.e., the KDClassifier, combined with complete mass spectrum information from the urinary proteomics for distinguishing different kidney diseases. KDClassifier achieves a high accuracy and robustness, providing a potential tool for the classification of all types of kidney diseases.

Download Full-text

Detection of misinformation on garlic and COVID-19 in Twitter: A machine learning-based approach (Preprint)

10.2196/preprints.33056 ◽

2021 ◽

Author(s):

Myeong Gyu Kim ◽

Jae Hyun Kim ◽

Kyungim Kim

Keyword(s):

Machine Learning ◽

Social Media ◽

Latent Dirichlet Allocation ◽

Predictive Performance ◽

Machine Learning Algorithms ◽

Training Dataset ◽

Polynomial Kernel ◽

Support Vector ◽

Accurate Information ◽

Probability Number

BACKGROUND Garlic-related misinformation is prevalent whenever a virus outbreak occurs. Again, with the outbreak of coronavirus disease 2019 (COVID-19), garlic-related misinformation is spreading through social media sites, including Twitter. Machine learning-based approaches can be used to detect misinformation from vast tweets. OBJECTIVE This study aimed to develop machine learning algorithms for detecting misinformation on garlic and COVID-19 in Twitter. METHODS This study used 5,929 original tweets mentioning garlic and COVID-19. Tweets were manually labeled as misinformation, accurate information, and others. We tested the following algorithms: k-nearest neighbors; random forest; support vector machine (SVM) with linear, radial, and polynomial kernels; and neural network. Features for machine learning included user-based features (verified account, user type, number of followers, and follower rate) and text-based features (uniform resource locator, negation, sentiment score, Latent Dirichlet Allocation topic probability, number of retweets, and number of favorites). A model with the highest accuracy in the training dataset (70% of overall dataset) was tested using a test dataset (30% of overall dataset). Predictive performance was measured using overall accuracy, sensitivity, specificity, and balanced accuracy. RESULTS SVM with the polynomial kernel model showed the highest accuracy of 0.670. The model also showed a balanced accuracy of 0.757, sensitivity of 0.819, and specificity of 0.696 for misinformation. Important features in the misinformation and accurate information classes included topic 4 (common myths), topic 13 (garlic-specific myths), number of followers, topic 11 (misinformation on social media), and follower rate. Topic 3 (cooking recipes) was the most important feature in the others class. CONCLUSIONS Our SVM model showed good performance in detecting misinformation. The results of our study will help detect misinformation related to garlic and COVID-19. It could also be applied to prevent misinformation related to dietary supplements in the event of a future outbreak of a disease other than COVID-19.

Download Full-text

Planet Image-Based Inventorying and Machine Learning-Based Susceptibility Mapping for the Landslides Triggered by the 2018 Mw6.6 Tomakomai, Japan Earthquake

Remote Sensing ◽

10.3390/rs11080978 ◽

2019 ◽

Vol 11 (8) ◽

pp. 978 ◽

Cited By ~ 15

Author(s):

Xiaoyi Shao ◽

Siyuan Ma ◽

Chong Xu ◽

Pengfei Zhang ◽

Boyu Wen ◽

...

Keyword(s):

Success Rate ◽

Landslide Susceptibility ◽

Slope Failure ◽

Slope Angle ◽

Training Dataset ◽

Support Vector ◽

Susceptibility Map ◽

Landslide Susceptibility Map ◽

Test Dataset ◽

Prediction Rate

The 5 September 2018 (UTC time) Mw6.6 earthquake of Tomakomai, Japan has triggered about 10,000 landslides with high density, causing widespread concern. We attempted to establish a detailed inventory of this slope failure and use proper methods to assess landslide susceptibility in the entire affected area. To this end we applied the logistic regression (LR) and the support vector machine (SVM) for this study. Based on high-resolution (3 m) optical satellite images (planet image) before and after the earthquake, we delineated 9295 individual landslides triggered by the earthquake, occupying an area of 30.96 km2. Ten controlling factors were selected for susceptibility analysis, including elevation, slope angle, aspect, curvature, distances to faults, distances to the epicenter, Peak ground acceleration (PGA), distance to rivers, distances to roads and lithology. Using the LR and SVM, two landslide susceptibility maps were produced for the study area. The results show that in the LR model, the success rate is 84.7% between the landslide susceptibility map and the training dataset, and the prediction rate is 83.9% shown by comparing the test dataset and the landslide susceptibility map. In the SVM model, a success rate of 90.9% exists between the susceptibility map and the test samples, and a prediction rate of 87.1% from comparison of the test dataset and the landslides susceptibility map. In comparison, the performance of the SVM is slightly better than the LR model.

Download Full-text

Application of Deep Learning for Credit Card Approval: A Comparison with Two Machine Learning Techniques

International Journal of Machine Learning and Computing ◽

10.18178/ijmlc.2021.11.4.1049 ◽

2021 ◽

Vol 11 (4) ◽

pp. 286-290

Author(s):

Md. Golam Kibria ◽

◽

Mehmet Sevkli

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Credit Card ◽

Learning Algorithms ◽

Learning Model ◽

Machine Learning Algorithms ◽

The Other ◽

Machine Learning Techniques ◽

Support Vector ◽

Deep Learning Model

The increased credit card defaulters have forced the companies to think carefully before the approval of credit applications. Credit card companies usually use their judgment to determine whether a credit card should be issued to the customer satisfying certain criteria. Some machine learning algorithms have also been used to support the decision. The main objective of this paper is to build a deep learning model based on the UCI (University of California, Irvine) data sets, which can support the credit card approval decision. Secondly, the performance of the built model is compared with the other two traditional machine learning algorithms: logistic regression (LR) and support vector machine (SVM). Our results show that the overall performance of our deep learning model is slightly better than that of the other two models.

Download Full-text

How Do Multiple Kernel Functions in Machine Learning Algorithms Improve Precision in Flood Probability Mapping?

10.21203/rs.3.rs-749595/v1 ◽

2021 ◽

Author(s):

Muhammad Aslam Baig ◽

Donghong XIONG ◽

Mahfuzur Rahman ◽

Md. Monirul Islam ◽

Ahmad Elbeltagi ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Absolute Error ◽

Kernel Functions ◽

Machine Learning Algorithms ◽

Validation Dataset ◽

Support Vector ◽

Probability Map ◽

Multiple Kernel ◽

Flood Probability

Abstract With climate change, hydro-climatic hazards, i.e., floods in the Himalayas regions, are expected to worsen, thus, likely to affect humans and socio-economic growth. Precisely, the Koshi River basin (KRB) is often impacted by flooding over the year. However, studies on estimating and predicting floods still lack in this basin. This study aims at developing flood probability map using machine learning algorithms (MLAs): gaussian process regression (GPR) and support vector machine (SVM) with multiple kernel functions including Pearson VII function kernel (PUK), polynomial, normalized poly kernel, and radial basis kernel function (RBF). Historical flood locations with available topography, hydrogeology, and environmental datasets were further considered to build flood model. Two datasets were carefully chosen to measure the feasibility and robustness of MLAs: training dataset (location of floods between 2010 and 2019) and testing dataset (flood locations of 2020) with thirteen flood influencing factors. The validation of the MLAs was evaluated using a validation dataset and statistical indices such as the coefficient of determination (r2: 0.546~0.995), mean absolute error (MAE: 0.009~0.373), root mean square error (RMSE: 0.051~0.466), relative absolute error (RAE: 1.81~88.55%), and root-relative square error (RRSE: 10.19~91.00%). Results showed that the SVM-Pearson VII kernel (PUK) yielded better prediction than other algorithms. The resultant map from SVM-PUK revealed that 27.99% area with low, 39.91% area with medium, 31.00% with high, and 1.10% area with very high probabilities of flooding in the study area. The final flood probability map could add a greatt value to the effort of flood risk mitigation and planning processes in KRB.

Download Full-text

A Comparative Study of Landslide Susceptibility Mapping Using SVM and PSO-SVM Models Based on Grid and Slope Units

Mathematical Problems in Engineering ◽

10.1155/2021/8854606 ◽

2021 ◽

Vol 2021 ◽

pp. 1-15

Author(s):

Shuai Zhao ◽

Zhou Zhao

Keyword(s):

Support Vector Machine ◽

Landslide Susceptibility ◽

Learning Algorithm ◽

Slope Angle ◽

Susceptibility Mapping ◽

Landslide Susceptibility Mapping ◽

Training Dataset ◽

Validation Dataset ◽

Slope Aspect ◽

Support Vector

The main purpose of this study aims to apply and compare the rationality of landslide susceptibility maps using support vector machine (SVM) and particle swarm optimization coupled with support vector machine (PSO-SVM) models in Lueyang County, China, enhance the connection with the natural terrain, and analyze the application of grid units and slope units. A total of 186 landslide locations were identified by earlier reports and field surveys. The landslide inventory was randomly divided into two parts: 70% for training dataset and 30% for validation dataset. Based on the multisource data and geological environment, 16 landslide conditioning factors were selected, including control factors and triggering factors (i.e., altitude, slope angle, slope aspect, plan curvature, profile curvature, SPI, TPI, TRI, lithology, distance to faults, TWI, distance to rivers, NDVI, distance to roads, land use, and rainfall). The susceptibility between each conditioning factor and landslide was deduced using a certainty factor model. Subsequently, combined with grid units and slope units, the landslide susceptibility models were carried out by using SVM and PSO-SVM methods. The precision capability of the landslide susceptibility mapping produced by different models and units was verified through a receiver operating characteristic (ROC) curve. The results showed that the PSO-SVM model based on slope units had the best performance in landslide susceptibility mapping, and the area under the curve (AUC) values of training and validation datasets are 0.945 and 0.9245, respectively. Hence, the machine learning algorithm coupled with slope units can be considered a reliable and effective technique in landslide susceptibility mapping.

Download Full-text

Use of Supervised Machine Learning for GNSS Signal Spoofing Detection with Validation on Real-World Meaconing and Spoofing Data—Part II

Sensors ◽

10.3390/s20071806 ◽

2020 ◽

Vol 20 (7) ◽

pp. 1806

Author(s):

Silvio Semanjski ◽

Ivana Semanjski ◽

Wim De Wilde ◽

Sidharta Gautama

Keyword(s):

Machine Learning ◽

Real World ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Added Value ◽

Supervised Machine Learning ◽

Training Dataset ◽

Support Vector ◽

Correlation Pattern ◽

The Real

Global Navigation Satellite System (GNSS) meaconing and spoofing are being considered as the key threats to the Safety-of-Life (SoL) applications that mostly rely upon the use of open service (OS) signals without signal or data-level protection. While a number of pre and post correlation techniques have been proposed so far, possible utilization of the supervised machine learning algorithms to detect GNSS meaconing and spoofing is currently being examined. One of the supervised machine learning algorithms, the Support Vector Machine classification (C-SVM), is proposed for utilization at the GNSS receiver level due to fact that at that stage of signal processing, a number of measurements and observables exists. It is possible to establish the correlation pattern among those GNSS measurements and observables and monitor it with use of the C-SVM classification, the results of which we present in this paper. By adding the real-world spoofing and meaconing datasets to the laboratory-generated spoofing datasets at the training stage of the C-SVM, we complement the experiments and results obtained in Part I of this paper, where the training was conducted solely with the use of laboratory-generated spoofing datasets. In two experiments presented in this paper, the C-SVM algorithm was cross-fed with the real-world meaconing and spoofing datasets, such that the meaconing addition to the training was validated by the spoofing dataset, and vice versa. The comparative analysis of all four experiments presented in this paper shows promising results in two aspects: (i) the added value of the training dataset enrichment seems to be relevant for real-world GNSS signal manipulation attempt detection and (ii) the C-SVM-based approach seems to be promising for GNSS signal manipulation attempt detection, as well as in the context of potential federated learning applications.

Download Full-text

APLICAÇÃO DE MACHINE LEARNING NA IDENTIFICAÇÃO DE E-MAILS COMO SPAM

Colloquium Exactarum ◽

10.5747/ce.2020.v12.n3.e327 ◽

2021 ◽

Vol 12 (3) ◽

pp. 31-38

Author(s):

Michelle Tais Garcia Furuya ◽

Danielle Elis Garcia Furuya

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

The Other ◽

Support Vector ◽

K Nearest Neighbors ◽

Mail Service ◽

E Mail

The e-mail service is one of the main tools used today and is an example that technology facilitates the exchange of information. On the other hand, one of the biggest obstacles faced by e-mail services is spam, the name given to the unsolicited message received by a user. The machine learning application has been gaining prominence in recent years as an alternative for efficient identification of spam. In this area, different algorithms can be evaluated to identify which one has the best performance. The aim of the study is to identify the ability of machine learning algorithms to correctly classify e-mails and also to identify which algorithm obtained the greatest accuracy. The database used was taken from the Kaggle platform and the data were processed bythe Orange software with four algorithms: Random Forest (RF), K-Nearest Neighbors (KNN), Support Vector Machine (SVM) and Naive Bayes (NB). The division of data in training and testing considers 80% of the data for training and 20% for testing. The results show that Random Forest was the best performing algorithm with 99% accuracy.

Download Full-text

Enabling Accurate Indoor Localization Using a Machine Learning Algorithm

UHD Journal of Science and Technology ◽

10.21928/uhdjst.v4n1y2020.pp96-102 ◽

2020 ◽

Vol 4 (1) ◽

pp. 96

Author(s):

Haidar Abdulrahman Abbas ◽

Kayhan Zrar Ghafoor

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Learning Algorithm ◽

Excellent Result ◽

Indoor Positioning ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Training Dataset ◽

Support Vector ◽

Indoor Environments

In this paper, fingerprint referencing methods based on wireless fidelity Wi-Fi received signal strength (RSS) have used for indoor positioning. More precisely, Naïve Bayes, decision tree (DT), and support vector machine (SVM) one-to-one multi-classes and error-correcting-output-codes classifier are to enable accurate indoor positioning. Then, normalization is used to reduce positioning error by reducing the fluctuation and diverse distribution of the RSS values. Different devices are used in this experiment; the training dataset is not included in the main dataset. Nonetheless, the learned model by the SVM algorithm cannot be affected by the elimination of train datasets of the test device. The efficiency of DT is lower than the other machine learning algorithms, because it performs by Boolean function, and it provides the low accuracy of prediction for dataset than the algorithms. Naïve Bayes technique based on Bayes Theorem is better than DT and close to SVM for positioning approves that 1–1.5 m positioning accuracy for indoor environments can be achieved by the proposed approach which is an excellent result than traditional protocol.

Download Full-text

Evaluation of Machine Learning Predictions of a Highly Resolved Time Series of Chlorophyll-a Concentration

Applied Sciences ◽

10.3390/app11167208 ◽

2021 ◽

Vol 11 (16) ◽

pp. 7208

Author(s):

Felipe de Luca Lopes de Amorim ◽

Johannes Rick ◽

Gerrit Lohmann ◽

Karen Helen Wiltshire

Keyword(s):

Machine Learning ◽

Time Series ◽

Support Vector Machine ◽

Chlorophyll A ◽

Environmental Parameters ◽

Machine Learning Algorithms ◽

Validation Dataset ◽

Support Vector ◽

Environmental Status

Pelagic chlorophyll-a concentrations are key for evaluation of the environmental status and productivity of marine systems, and data can be provided by in situ measurements, remote sensing and modelling. However, modelling chlorophyll-a is not trivial due to its nonlinear dynamics and complexity. In this study, chlorophyll-a concentrations for the Helgoland Roads time series were modeled using a number of measured water and environmental parameters. We chose three common machine learning algorithms from the literature: the support vector machine regressor, neural networks multi-layer perceptron regressor and random forest regressor. Results showed that the support vector machine regressor slightly outperformed other models. The evaluation with a test dataset and verification with an independent validation dataset for chlorophyll-a concentrations showed a good generalization capacity, evaluated by the root mean squared errors of less than 1 µg L−1. Feature selection and engineering are important and improved the models significantly, as measured in performance, improving the adjusted R2 by a minimum of 48%. We tested SARIMA in comparison and found that the univariate nature of SARIMA does not allow for better results than the machine learning models. Additionally, the computer processing time needed was much higher (prohibitive) for SARIMA.

Download Full-text