Development of Machine Learning Strategy for Predicting the Risk Range of Ship’s Berthing Velocity

Hyeong-Tak Lee; Jeong-Seok Lee; Woo-Ju Son; Ik-Soon Cho

doi:10.3390/jmse8050376

Gully Erosion Susceptibility Mapping in Highly Complex Terrain Using Machine Learning Models

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10100680 ◽

2021 ◽

Vol 10 (10) ◽

pp. 680

Author(s):

Annan Yang ◽

Chunmei Wang ◽

Guowei Pang ◽

Yongqing Long ◽

Lei Wang ◽

...

Keyword(s):

Machine Learning ◽

Complex Terrain ◽

Large Scale ◽

Area Under The Curve ◽

Gully Erosion ◽

Susceptibility Mapping ◽

Weight Of Evidence ◽

Gradient Boosting ◽

Machine Learning Classification ◽

Extreme Gradient Boosting

Gully erosion is the most severe type of water erosion and is a major land degradation process. Gully erosion susceptibility mapping (GESM)’s efficiency and interpretability remains a challenge, especially in complex terrain areas. In this study, a WoE-MLC model was used to solve the above problem, which combines machine learning classification algorithms and the statistical weight of evidence (WoE) model in the Loess Plateau. The three machine learning (ML) algorithms utilized in this research were random forest (RF), gradient boosted decision trees (GBDT), and extreme gradient boosting (XGBoost). The results showed that: (1) GESM were well predicted by combining both machine learning regression models and WoE-MLC models, with the area under the curve (AUC) values both greater than 0.92, and the latter was more computationally efficient and interpretable; (2) The XGBoost algorithm was more efficient in GESM than the other two algorithms, with the strongest generalization ability and best performance in avoiding overfitting (averaged AUC = 0.947), followed by the RF algorithm (averaged AUC = 0.944), and GBDT algorithm (averaged AUC = 0.938); and (3) slope gradient, land use, and altitude were the main factors for GESM. This study may provide a possible method for gully erosion susceptibility mapping at large scale.

Download Full-text

A Comparative Analysis of Machine Learning Models for Prediction of Insurance Uptake in Kenya

10.20944/preprints202010.0186.v1 ◽

2020 ◽

Author(s):

Nelson Yego ◽

Juma Kasozi ◽

Joseph Nkrunziza

Keyword(s):

Machine Learning ◽

Random Forest ◽

Characteristic Curve ◽

Confusion Matrix ◽

Gradient Boosting ◽

Support Vector ◽

Sampled Data ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

The role of insurance in financial inclusion as well as in economic growth is immense. However, low uptake seems to impede the growth of the sector hence the need for a model that robustly predicts uptake of insurance among potential clients. In this research, we compared the performances of eight (8) machine learning models in predicting the uptake of insurance. The classifiers considered were Logistic Regression, Gaussian Naive Bayes, Support Vector Machines, K Nearest Neighbors, Decision Tree, Random Forest, Gradient Boosting Machines and Extreme Gradient boosting. The data used in the classification was from the 2016 Kenya FinAccess Household Survey. Comparison of performance was done for both upsampled and downsampled data due to data imbalance. For upsampled data, Random Forest classifier showed highest accuracy and precision compared to other classifiers but for down sampled data, gradient boosting was optimal. It is noteworthy that for both upsampled and downsampled data, tree-based classifiers were more robust than others in insurance uptake prediction. However, in spite of hyper-parameter optimization, the area under receiver operating characteristic curve remained highest for Random Forest as compared to other tree-based models. Also, the confusion matrix for Random Forest showed least false positives, and highest true positives hence could be construed as the most robust model for predicting the insurance uptake. Finally, the most important feature in predicting uptake was having a bank product hence bancassurance could be said to be a plausible channel of distribution of insurance products.

Download Full-text

Performance Evaluation of Different Machine Learning Classification Algorithms for Disease Diagnosis

International Journal of E-Health and Medical Communications ◽

10.4018/ijehmc.20211101.oa5 ◽

2021 ◽

Vol 12 (6) ◽

pp. 1-28

Author(s):

Munder Abdulatef Al-Hashem ◽

Ali Mohammad Alqudah ◽

Qasem Qananwah

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Performance Metrics ◽

Confusion Matrix ◽

Learning Algorithms ◽

Disease Diagnosis ◽

Machine Learning Algorithms ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Machine Learning Classification

Knowledge extraction within a healthcare field is a very challenging task since we are having many problems such as noise and imbalanced datasets. They are obtained from clinical studies where uncertainty and variability are popular. Lately, a wide number of machine learning algorithms are considered and evaluated to check their validity of being used in the medical field. Usually, the classification algorithms are compared against medical experts who are specialized in certain disease diagnoses and provide an effective methodological evaluation of classifiers by applying performance metrics. The performance metrics contain four criteria: accuracy, sensitivity, and specificity forming the confusion matrix of each used algorithm. We have utilized eight different well-known machine learning algorithms to evaluate their performances in six different medical datasets. Based on the experimental results we conclude that the XGBoost and K-Nearest Neighbor classifiers were the best overall among the used datasets and signs can be used for diagnosing various diseases.

Download Full-text

MRI Radiomic Features to Predict IDH1 Mutation Status in Gliomas: A Machine Learning Approach using Gradient Tree Boosting

International Journal of Molecular Sciences ◽

10.3390/ijms21218004 ◽

2020 ◽

Vol 21 (21) ◽

pp. 8004

Author(s):

Yu Sakai ◽

Chen Yang ◽

Shingo Kihira ◽

Nadejda Tsankova ◽

Fahad Khan ◽

...

Keyword(s):

Machine Learning ◽

Characteristic Curve ◽

Area Under The Curve ◽

Prognostic Indicator ◽

Idh1 Mutation ◽

Gradient Boosting ◽

Isocitrate Dehydrogenase 1 ◽

Test Set ◽

Mutation Status ◽

Extreme Gradient Boosting

In patients with gliomas, isocitrate dehydrogenase 1 (IDH1) mutation status has been studied as a prognostic indicator. Recent advances in machine learning (ML) have demonstrated promise in utilizing radiomic features to study disease processes in the brain. We investigate whether ML analysis of multiparametric radiomic features from preoperative Magnetic Resonance Imaging (MRI) can predict IDH1 mutation status in patients with glioma. This retrospective study included patients with glioma with known IDH1 status and preoperative MRI. Radiomic features were extracted from Fluid-Attenuated Inversion Recovery (FLAIR) and Diffusion-Weighted-Imaging (DWI). The dataset was split into training, validation, and testing sets by stratified sampling. Synthetic Minority Oversampling Technique (SMOTE) was applied to the training sets. eXtreme Gradient Boosting (XGBoost) classifiers were trained, and the hyperparameters were tuned. Receiver operating characteristic curve (ROC), accuracy, and f1-scores were collected. A total of 100 patients (age: 55 ± 15, M/F 60/40); with IDH1 mutant (n = 22) and IDH1 wildtype (n = 78) were included. The best performance was seen with a DWI-trained XGBoost model, which achieved ROC with Area Under the Curve (AUC) of 0.97, accuracy of 0.90, and f1-score of 0.75 on the test set. The FLAIR-trained XGBoost model achieved ROC with AUC of 0.95, accuracy of 0.90, f1-score of 0.75 on the test set. A model that was trained on combined FLAIR-DWI radiomic features did not provide incremental accuracy. The results show that a XGBoost classifier using multiparametric radiomic features derived from preoperative MRI can predict IDH1 mutation status with > 90% accuracy.

Download Full-text

A Study of Machine-Learning Classifiers for Hypertension Based on Radial Pulse Wave

BioMed Research International ◽

10.1155/2018/2964816 ◽

2018 ◽

Vol 2018 ◽

pp. 1-12 ◽

Cited By ~ 2

Author(s):

Zhi-yu Luo ◽

Ji Cui ◽

Xiao-juan Hu ◽

Li-ping Tu ◽

Hai-dan Liu ◽

...

Keyword(s):

Machine Learning ◽

Chinese Medicine ◽

Pulse Wave ◽

Dynamic Change ◽

Area Under The Curve ◽

Research Direction ◽

Disease Diagnosis ◽

Gradient Boosting ◽

Machine Learning Classification ◽

Digital Pulse

Objective. In this study, machine learning was utilized to classify and predict pulse wave of hypertensive group and healthy group and assess the risk of hypertension by observing the dynamic change of the pulse wave and provide an objective reference for clinical application of pulse diagnosis in traditional Chinese medicine (TCM). Method. The basic information from 450 hypertensive cases and 479 healthy cases was collected by self-developed H20 questionnaires and pulse wave information was acquired by self-developed pulse diagnostic instrument (PDA-1). H20 questionnaires and pulse wave information were used as input variables to obtain different machine learning classification models of hypertension. This method was aimed at analyzing the influence of pulse wave on the accuracy and stability of machine learning model, as well as the feature contribution of hypertension model after removing noise by K-means. Result. Compared with the classification results before removing noise, the accuracy and the area under the curve (AUC) had been improved. The accuracy rates of AdaBoost, Gradient Boosting, and Random Forest (RF) were 86.41%, 86.41%, and 85.33%, respectively. AUC were 0.86, 0.86, and 0.85, respectively. The maximum accuracy of SVM increased from 79.57% to 83.15%, and the AUC stability increased from 0.79 to 0.83. In addition, the features of importance on traditional statistics and machine learning were consistent. After removing noise, the features with large changes were h1/t1, w1/t, t, w2, h2, t1, and t5 in AdaBoost and Gradient Boosting (top10). The common variables for machine learning and traditional statistics were h1/t1, h5, t, Ad, BMI, and t2. Conclusion. Pulse wave-based diagnostic method of hypertension has significant value in reference. In view of the feasibility of digital-pulse-wave diagnosis and dynamically evaluating hypertension, it provides the research direction and foundation for Chinese medicine in the dynamic evaluation of modern disease diagnosis and curative effect.

Download Full-text

Sarcasm detection of tweets without #sarcasm: data science approach

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v23.i2.pp993-1001 ◽

2021 ◽

Vol 23 (2) ◽

pp. 993

Author(s):

Rupali Amit Bagate ◽

R. Suguna

Keyword(s):

Machine Learning ◽

Language Processing ◽

Data Science ◽

Short Term Memory ◽

Confusion Matrix ◽

Research Work ◽

Gradient Boosting ◽

Specific Context ◽

Machine Learning Classification ◽

Light Gradient

Identifying sarcasm present in the text could be a challenging work. In sarcasm, a negative word can flip the polarity of a positive sentence. Sentences can be classified as sarcastic or non-sarcastic. It is easier to identify sarcasm using facial expression or tonal weight rather detecting from plain text. Thus, sarcasm detection using natural language processing is major challenge without giving away any specific context or clue such as #sarcasm present in a tweet. Therefore, research tries to solve this classification problem using various optimized models. Proposed model, analyzes whether a given tweet, is sarcastic or not without the presnece of hashtag sarcasm or any kind of specific context present in text. To achieve better results, we used different machine learning classification methodology along with deep learning embedding techniques. Our optimized model uses a stacking technique which combines the result of logistic regression and long short-term memory (LSTM) recurrent neural net feed to light gradient boosting technique which generates better result as compare to existing machine learning and neural network algorithm. The key difference of our research work is sarcasm detection done without #sarcasm which has not been much explored earlier by any researcher. The metrics used for evolutionis F1-score and confusion matrix.

Download Full-text

Performance Evaluation of Different Machine Learning Classification Algorithms for Diseases Diagnosis

International Journal of E-Health and Medical Communications ◽

10.4018/ijehmc.20211101oa09 ◽

2021 ◽

Vol 12 (6) ◽

pp. 0-0

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Performance Metrics ◽

Confusion Matrix ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Machine Learning Classification ◽

Nearest Neighbor Classifiers

Knowledge extraction within a healthcare field is a very challenging task since we are having many problems such as noise and imbalanced datasets. They are obtained from clinical studies where uncertainty and variability are popular. Lately, a wide number of machine learning algorithms are considered and evaluated to check their validity of being used in the medical field. Usually, the classification algorithms are compared against medical experts who are specialized in certain disease diagnoses and provide an effective methodological evaluation of classifiers by applying performance metrics. The performance metrics contain four criteria: accuracy, sensitivity, and specificity forming the confusion matrix of each used algorithm. We have utilized eight different well-known machine learning algorithms to evaluate their performances in six different medical datasets. Based on the experimental results we conclude that the XGBoost and K-Nearest Neighbor classifiers were the best overall among the used datasets and signs can be used for diagnosing various diseases.

Download Full-text

An Investigation of Machine Learning Algorithms on COVID-19 Dataset

10.21203/rs.3.rs-70985/v1 ◽

2020 ◽

Author(s):

Prasannavenkatesan Theerthagiri ◽

I.Jeena Jacob ◽

A.Usha Ruby ◽

Y.Vamsidhar

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Characteristic Curve ◽

Confusion Matrix ◽

True Positive Rate ◽

Receiver Operator Characteristic Curve ◽

Machine Learning Algorithms ◽

Kappa Score ◽

Machine Learning Classification ◽

Positive Rate

Abstract This paper studies the different machine learning classification algorithms to predict the COVID-19 recovered and deceased cases. The k-fold cross-validation resampling technique is used to validate the prediction model. The prediction scores of each algorithm are evaluated with performance metrics such as prediction accuracy, precision, recall, mean square error, confusion matrix, and kappa score. For the given dataset, the k-nearest neighbour (KNN) classification algorithm produces 80.4 % of predication accuracy and 1.5 to 3.3 % of improved accuracy over other algorithms. The KNN algorithm predicts 92 % (true positive rate) of the deceased cases correctly with 0.077 % of misclassification. Further, the KNN algorithm produces the lowest error rate as 0.19 on the prediction of accurate COVID-19 cases than the other algorithm. Also, it produces the receiver operator characteristic curve with the output value of 82 %.

Download Full-text

A Novel Machine Learning Strategy for the Prediction of Antihypertensive Peptides Derived from Food with High Efficiency

Foods ◽

10.3390/foods10030550 ◽

2021 ◽

Vol 10 (3) ◽

pp. 550

Author(s):

Liyang Wang ◽

Dantong Niu ◽

Xiaoya Wang ◽

Jabir Khan ◽

Qun Shen ◽

...

Keyword(s):

Machine Learning ◽

High Throughput ◽

High Efficiency ◽

Learning Strategy ◽

Characteristic Curve ◽

Bovine Milk ◽

Protein Docking ◽

Gradient Boosting ◽

Extreme Gradient Boosting ◽

Antihypertensive Peptides

Strategies to screen antihypertensive peptides with high throughput and rapid speed will doubtlessly contribute to the treatment of hypertension. Food-derived antihypertensive peptides can reduce blood pressure without side effects. In the present study, a novel model based on the eXtreme Gradient Boosting (XGBoost) algorithm was developed and compared with the dominating machine learning models. To further reflect on the reliability of the method in a real situation, the optimized XGBoost model was utilized to predict the antihypertensive degree of the k-mer peptides cutting from six key proteins in bovine milk, and the peptide–protein docking technology was introduced to verify the findings. The results showed that the XGBoost model achieved outstanding performance, with an accuracy of 86.50% and area under the receiver operating characteristic curve of 94.11%, which were better than the other models. Using the XGBoost model, the prediction of antihypertensive peptides derived from milk protein was consistent with the peptide–protein docking results, and was more efficient. Our results indicate that using the XGBoost algorithm as a novel auxiliary tool is feasible to screen for antihypertensive peptides derived from food, with high throughput and high efficiency.

Download Full-text

Severity Prediction of COVID-19 Patients Using Machine Learning Classification Algorithms: A Case Study of Small City in Pakistan with Minimal Health Facility

2020 IEEE 6th International Conference on Computer and Communications (ICCC) ◽

10.1109/iccc51575.2020.9344984 ◽

2020 ◽

Author(s):

Hina Gull ◽

Gomathi Krishna ◽

May Issa Aldossary ◽

Sardar Zafar Iqbal

Keyword(s):

Machine Learning ◽

Health Facility ◽

Small City ◽

Classification Algorithms ◽

Machine Learning Classification ◽

Severity Prediction

Download Full-text