scholarly journals Land Subsidence Susceptibility Mapping in South Korea Using Machine Learning Algorithms

Sensors ◽  
2018 ◽  
Vol 18 (8) ◽  
pp. 2464 ◽  
Author(s):  
Dieu Tien Bui ◽  
Himan Shahabi ◽  
Ataollah Shirzadi ◽  
Kamran Chapi ◽  
Biswajeet Pradhan ◽  
...  

In this study, land subsidence susceptibility was assessed for a study area in South Korea by using four machine learning models including Bayesian Logistic Regression (BLR), Support Vector Machine (SVM), Logistic Model Tree (LMT) and Alternate Decision Tree (ADTree). Eight conditioning factors were distinguished as the most important affecting factors on land subsidence of Jeong-am area, including slope angle, distance to drift, drift density, geology, distance to lineament, lineament density, land use and rock-mass rating (RMR) were applied to modelling. About 24 previously occurred land subsidence were surveyed and used as training dataset (70% of data) and validation dataset (30% of data) in the modelling process. Each studied model generated a land subsidence susceptibility map (LSSM). The maps were verified using several appropriate tools including statistical indices, the area under the receiver operating characteristic (AUROC) and success rate (SR) and prediction rate (PR) curves. The results of this study indicated that the BLR model produced LSSM with higher acceptable accuracy and reliability compared to the other applied models, even though the other models also had reasonable results.

2020 ◽  
Author(s):  
Wanjun Zhao ◽  
Yong Zhang ◽  
Xinming Li ◽  
Yonghong Mao ◽  
Changwei Wu ◽  
...  

AbstractBackgroundBy extracting the spectrum features from urinary proteomics based on an advanced mass spectrometer and machine learning algorithms, more accurate reporting results can be achieved for disease classification. We attempted to establish a novel diagnosis model of kidney diseases by combining machine learning with an extreme gradient boosting (XGBoost) algorithm with complete mass spectrum information from the urinary proteomics.MethodsWe enrolled 134 patients (including those with IgA nephropathy, membranous nephropathy, and diabetic kidney disease) and 68 healthy participants as a control, and for training and validation of the diagnostic model, applied a total of 610,102 mass spectra from their urinary proteomics produced using high-resolution mass spectrometry. We divided the mass spectrum data into a training dataset (80%) and a validation dataset (20%). The training dataset was directly used to create a diagnosis model using XGBoost, random forest (RF), a support vector machine (SVM), and artificial neural networks (ANNs). The diagnostic accuracy was evaluated using a confusion matrix. We also constructed the receiver operating-characteristic, Lorenz, and gain curves to evaluate the diagnosis model.ResultsCompared with RF, the SVM, and ANNs, the modified XGBoost model, called a Kidney Disease Classifier (KDClassifier), showed the best performance. The accuracy of the diagnostic XGBoost model was 96.03% (CI = 95.17%-96.77%; Kapa = 0.943; McNemar’s Test, P value = 0.00027). The area under the curve of the XGBoost model was 0.952 (CI = 0.9307-0.9733). The Kolmogorov-Smirnov (KS) value of the Lorenz curve was 0.8514. The Lorenz and gain curves showed the strong robustness of the developed model.ConclusionsThis study presents the first XGBoost diagnosis model, i.e., the KDClassifier, combined with complete mass spectrum information from the urinary proteomics for distinguishing different kidney diseases. KDClassifier achieves a high accuracy and robustness, providing a potential tool for the classification of all types of kidney diseases.


2021 ◽  
Author(s):  
Myeong Gyu Kim ◽  
Jae Hyun Kim ◽  
Kyungim Kim

BACKGROUND Garlic-related misinformation is prevalent whenever a virus outbreak occurs. Again, with the outbreak of coronavirus disease 2019 (COVID-19), garlic-related misinformation is spreading through social media sites, including Twitter. Machine learning-based approaches can be used to detect misinformation from vast tweets. OBJECTIVE This study aimed to develop machine learning algorithms for detecting misinformation on garlic and COVID-19 in Twitter. METHODS This study used 5,929 original tweets mentioning garlic and COVID-19. Tweets were manually labeled as misinformation, accurate information, and others. We tested the following algorithms: k-nearest neighbors; random forest; support vector machine (SVM) with linear, radial, and polynomial kernels; and neural network. Features for machine learning included user-based features (verified account, user type, number of followers, and follower rate) and text-based features (uniform resource locator, negation, sentiment score, Latent Dirichlet Allocation topic probability, number of retweets, and number of favorites). A model with the highest accuracy in the training dataset (70% of overall dataset) was tested using a test dataset (30% of overall dataset). Predictive performance was measured using overall accuracy, sensitivity, specificity, and balanced accuracy. RESULTS SVM with the polynomial kernel model showed the highest accuracy of 0.670. The model also showed a balanced accuracy of 0.757, sensitivity of 0.819, and specificity of 0.696 for misinformation. Important features in the misinformation and accurate information classes included topic 4 (common myths), topic 13 (garlic-specific myths), number of followers, topic 11 (misinformation on social media), and follower rate. Topic 3 (cooking recipes) was the most important feature in the others class. CONCLUSIONS Our SVM model showed good performance in detecting misinformation. The results of our study will help detect misinformation related to garlic and COVID-19. It could also be applied to prevent misinformation related to dietary supplements in the event of a future outbreak of a disease other than COVID-19.


2019 ◽  
Vol 11 (8) ◽  
pp. 978 ◽  
Author(s):  
Xiaoyi Shao ◽  
Siyuan Ma ◽  
Chong Xu ◽  
Pengfei Zhang ◽  
Boyu Wen ◽  
...  

The 5 September 2018 (UTC time) Mw6.6 earthquake of Tomakomai, Japan has triggered about 10,000 landslides with high density, causing widespread concern. We attempted to establish a detailed inventory of this slope failure and use proper methods to assess landslide susceptibility in the entire affected area. To this end we applied the logistic regression (LR) and the support vector machine (SVM) for this study. Based on high-resolution (3 m) optical satellite images (planet image) before and after the earthquake, we delineated 9295 individual landslides triggered by the earthquake, occupying an area of 30.96 km2. Ten controlling factors were selected for susceptibility analysis, including elevation, slope angle, aspect, curvature, distances to faults, distances to the epicenter, Peak ground acceleration (PGA), distance to rivers, distances to roads and lithology. Using the LR and SVM, two landslide susceptibility maps were produced for the study area. The results show that in the LR model, the success rate is 84.7% between the landslide susceptibility map and the training dataset, and the prediction rate is 83.9% shown by comparing the test dataset and the landslide susceptibility map. In the SVM model, a success rate of 90.9% exists between the susceptibility map and the test samples, and a prediction rate of 87.1% from comparison of the test dataset and the landslides susceptibility map. In comparison, the performance of the SVM is slightly better than the LR model.


2021 ◽  
Vol 11 (4) ◽  
pp. 286-290
Author(s):  
Md. Golam Kibria ◽  
◽  
Mehmet Sevkli

The increased credit card defaulters have forced the companies to think carefully before the approval of credit applications. Credit card companies usually use their judgment to determine whether a credit card should be issued to the customer satisfying certain criteria. Some machine learning algorithms have also been used to support the decision. The main objective of this paper is to build a deep learning model based on the UCI (University of California, Irvine) data sets, which can support the credit card approval decision. Secondly, the performance of the built model is compared with the other two traditional machine learning algorithms: logistic regression (LR) and support vector machine (SVM). Our results show that the overall performance of our deep learning model is slightly better than that of the other two models.


2021 ◽  
Author(s):  
Muhammad Aslam Baig ◽  
Donghong XIONG ◽  
Mahfuzur Rahman ◽  
Md. Monirul Islam ◽  
Ahmad Elbeltagi ◽  
...  

Abstract With climate change, hydro-climatic hazards, i.e., floods in the Himalayas regions, are expected to worsen, thus, likely to affect humans and socio-economic growth. Precisely, the Koshi River basin (KRB) is often impacted by flooding over the year. However, studies on estimating and predicting floods still lack in this basin. This study aims at developing flood probability map using machine learning algorithms (MLAs): gaussian process regression (GPR) and support vector machine (SVM) with multiple kernel functions including Pearson VII function kernel (PUK), polynomial, normalized poly kernel, and radial basis kernel function (RBF). Historical flood locations with available topography, hydrogeology, and environmental datasets were further considered to build flood model. Two datasets were carefully chosen to measure the feasibility and robustness of MLAs: training dataset (location of floods between 2010 and 2019) and testing dataset (flood locations of 2020) with thirteen flood influencing factors. The validation of the MLAs was evaluated using a validation dataset and statistical indices such as the coefficient of determination (r2: 0.546~0.995), mean absolute error (MAE: 0.009~0.373), root mean square error (RMSE: 0.051~0.466), relative absolute error (RAE: 1.81~88.55%), and root-relative square error (RRSE: 10.19~91.00%). Results showed that the SVM-Pearson VII kernel (PUK) yielded better prediction than other algorithms. The resultant map from SVM-PUK revealed that 27.99% area with low, 39.91% area with medium, 31.00% with high, and 1.10% area with very high probabilities of flooding in the study area. The final flood probability map could add a greatt value to the effort of flood risk mitigation and planning processes in KRB.


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Shuai Zhao ◽  
Zhou Zhao

The main purpose of this study aims to apply and compare the rationality of landslide susceptibility maps using support vector machine (SVM) and particle swarm optimization coupled with support vector machine (PSO-SVM) models in Lueyang County, China, enhance the connection with the natural terrain, and analyze the application of grid units and slope units. A total of 186 landslide locations were identified by earlier reports and field surveys. The landslide inventory was randomly divided into two parts: 70% for training dataset and 30% for validation dataset. Based on the multisource data and geological environment, 16 landslide conditioning factors were selected, including control factors and triggering factors (i.e., altitude, slope angle, slope aspect, plan curvature, profile curvature, SPI, TPI, TRI, lithology, distance to faults, TWI, distance to rivers, NDVI, distance to roads, land use, and rainfall). The susceptibility between each conditioning factor and landslide was deduced using a certainty factor model. Subsequently, combined with grid units and slope units, the landslide susceptibility models were carried out by using SVM and PSO-SVM methods. The precision capability of the landslide susceptibility mapping produced by different models and units was verified through a receiver operating characteristic (ROC) curve. The results showed that the PSO-SVM model based on slope units had the best performance in landslide susceptibility mapping, and the area under the curve (AUC) values of training and validation datasets are 0.945 and 0.9245, respectively. Hence, the machine learning algorithm coupled with slope units can be considered a reliable and effective technique in landslide susceptibility mapping.


Sensors ◽  
2020 ◽  
Vol 20 (7) ◽  
pp. 1806
Author(s):  
Silvio Semanjski ◽  
Ivana Semanjski ◽  
Wim De Wilde ◽  
Sidharta Gautama

Global Navigation Satellite System (GNSS) meaconing and spoofing are being considered as the key threats to the Safety-of-Life (SoL) applications that mostly rely upon the use of open service (OS) signals without signal or data-level protection. While a number of pre and post correlation techniques have been proposed so far, possible utilization of the supervised machine learning algorithms to detect GNSS meaconing and spoofing is currently being examined. One of the supervised machine learning algorithms, the Support Vector Machine classification (C-SVM), is proposed for utilization at the GNSS receiver level due to fact that at that stage of signal processing, a number of measurements and observables exists. It is possible to establish the correlation pattern among those GNSS measurements and observables and monitor it with use of the C-SVM classification, the results of which we present in this paper. By adding the real-world spoofing and meaconing datasets to the laboratory-generated spoofing datasets at the training stage of the C-SVM, we complement the experiments and results obtained in Part I of this paper, where the training was conducted solely with the use of laboratory-generated spoofing datasets. In two experiments presented in this paper, the C-SVM algorithm was cross-fed with the real-world meaconing and spoofing datasets, such that the meaconing addition to the training was validated by the spoofing dataset, and vice versa. The comparative analysis of all four experiments presented in this paper shows promising results in two aspects: (i) the added value of the training dataset enrichment seems to be relevant for real-world GNSS signal manipulation attempt detection and (ii) the C-SVM-based approach seems to be promising for GNSS signal manipulation attempt detection, as well as in the context of potential federated learning applications.


2021 ◽  
Vol 12 (3) ◽  
pp. 31-38
Author(s):  
Michelle Tais Garcia Furuya ◽  
Danielle Elis Garcia Furuya

The e-mail service is one of the main tools used today and is an example that technology facilitates the exchange of information. On the other hand, one of the biggest obstacles faced by e-mail services is spam, the name given to the unsolicited message received by a user. The machine learning application has been gaining prominence in recent years as an alternative for efficient identification of spam. In this area, different algorithms can be evaluated to identify which one has the best performance. The aim of the study is to identify the ability of machine learning algorithms to correctly classify e-mails and also to identify which algorithm obtained the greatest accuracy. The database used was taken from the Kaggle platform and the data were processed bythe Orange software with four algorithms: Random Forest (RF), K-Nearest Neighbors (KNN), Support Vector Machine (SVM) and Naive Bayes (NB). The division of data in training and testing considers 80% of the data for training and 20% for testing. The results show that Random Forest was the best performing algorithm with 99% accuracy.


2020 ◽  
Vol 4 (1) ◽  
pp. 96
Author(s):  
Haidar Abdulrahman Abbas ◽  
Kayhan Zrar Ghafoor

In this paper, fingerprint referencing methods based on wireless fidelity Wi-Fi received signal strength (RSS) have used for indoor positioning. More precisely, Naïve Bayes, decision tree (DT), and support vector machine (SVM) one-to-one multi-classes and error-correcting-output-codes classifier are to enable accurate indoor positioning. Then, normalization is used to reduce positioning error by reducing the fluctuation and diverse distribution of the RSS values. Different devices are used in this experiment; the training dataset is not included in the main dataset. Nonetheless, the learned model by the SVM algorithm cannot be affected by the elimination of train datasets of the test device. The efficiency of DT is lower than the other machine learning algorithms, because it performs by Boolean function, and it provides the low accuracy of prediction for dataset than the algorithms. Naïve Bayes technique based on Bayes Theorem is better than DT and close to SVM for positioning approves that 1–1.5 m positioning accuracy for indoor environments can be achieved by the proposed approach which is an excellent result than traditional protocol.


2021 ◽  
Vol 11 (16) ◽  
pp. 7208
Author(s):  
Felipe de Luca Lopes de Amorim ◽  
Johannes Rick ◽  
Gerrit Lohmann ◽  
Karen Helen Wiltshire

Pelagic chlorophyll-a concentrations are key for evaluation of the environmental status and productivity of marine systems, and data can be provided by in situ measurements, remote sensing and modelling. However, modelling chlorophyll-a is not trivial due to its nonlinear dynamics and complexity. In this study, chlorophyll-a concentrations for the Helgoland Roads time series were modeled using a number of measured water and environmental parameters. We chose three common machine learning algorithms from the literature: the support vector machine regressor, neural networks multi-layer perceptron regressor and random forest regressor. Results showed that the support vector machine regressor slightly outperformed other models. The evaluation with a test dataset and verification with an independent validation dataset for chlorophyll-a concentrations showed a good generalization capacity, evaluated by the root mean squared errors of less than 1 µg L−1. Feature selection and engineering are important and improved the models significantly, as measured in performance, improving the adjusted R2 by a minimum of 48%. We tested SARIMA in comparison and found that the univariate nature of SARIMA does not allow for better results than the machine learning models. Additionally, the computer processing time needed was much higher (prohibitive) for SARIMA.


Sign in / Sign up

Export Citation Format

Share Document