scholarly journals Development of Machine Learning Strategy for Predicting the Risk Range of Ship’s Berthing Velocity

2020 ◽  
Vol 8 (5) ◽  
pp. 376
Author(s):  
Hyeong-Tak Lee ◽  
Jeong-Seok Lee ◽  
Woo-Ju Son ◽  
Ik-Soon Cho

Ships are prone to accidents when approaching in a berthing velocity greater than that allowed when determining the risk range corresponding to a port. Therefore, this study develops a machine learning strategy to predict the risk range of an unsafe berthing velocity when the ship approaches in port. To perform analysis, the input parameters were based on the factors affecting the berthing velocity, and the output parameter, i.e., the berthing velocity, was measured at a tanker terminal in the Republic of Korea. Nine machine learning classification algorithms were used to analyze each model, and the top four optimal models were selected through evaluation methods based on the confusion matrix. As a result of the analysis, extra trees, random forest, bagging, and gradient boosting classifiers were identified as good models. As a result of testing using the receiving operator characteristic curve, it was confirmed that the area under the curve of the most dangerous range of berthing velocity was the highest, thus, the risk range was appropriately classified. As such, the derived models can classify and predict the risk range of unsafe berthing velocity before approaching a port; therefore, it is possible to safely berth a ship.

2021 ◽  
Vol 10 (10) ◽  
pp. 680
Author(s):  
Annan Yang ◽  
Chunmei Wang ◽  
Guowei Pang ◽  
Yongqing Long ◽  
Lei Wang ◽  
...  

Gully erosion is the most severe type of water erosion and is a major land degradation process. Gully erosion susceptibility mapping (GESM)’s efficiency and interpretability remains a challenge, especially in complex terrain areas. In this study, a WoE-MLC model was used to solve the above problem, which combines machine learning classification algorithms and the statistical weight of evidence (WoE) model in the Loess Plateau. The three machine learning (ML) algorithms utilized in this research were random forest (RF), gradient boosted decision trees (GBDT), and extreme gradient boosting (XGBoost). The results showed that: (1) GESM were well predicted by combining both machine learning regression models and WoE-MLC models, with the area under the curve (AUC) values both greater than 0.92, and the latter was more computationally efficient and interpretable; (2) The XGBoost algorithm was more efficient in GESM than the other two algorithms, with the strongest generalization ability and best performance in avoiding overfitting (averaged AUC = 0.947), followed by the RF algorithm (averaged AUC = 0.944), and GBDT algorithm (averaged AUC = 0.938); and (3) slope gradient, land use, and altitude were the main factors for GESM. This study may provide a possible method for gully erosion susceptibility mapping at large scale.


Author(s):  
Nelson Yego ◽  
Juma Kasozi ◽  
Joseph Nkrunziza

The role of insurance in financial inclusion as well as in economic growth is immense. However, low uptake seems to impede the growth of the sector hence the need for a model that robustly predicts uptake of insurance among potential clients. In this research, we compared the performances of eight (8) machine learning models in predicting the uptake of insurance. The classifiers considered were Logistic Regression, Gaussian Naive Bayes, Support Vector Machines, K Nearest Neighbors, Decision Tree, Random Forest, Gradient Boosting Machines and Extreme Gradient boosting. The data used in the classification was from the 2016 Kenya FinAccess Household Survey. Comparison of performance was done for both upsampled and downsampled data due to data imbalance. For upsampled data, Random Forest classifier showed highest accuracy and precision compared to other classifiers but for down sampled data, gradient boosting was optimal. It is noteworthy that for both upsampled and downsampled data, tree-based classifiers were more robust than others in insurance uptake prediction. However, in spite of hyper-parameter optimization, the area under receiver operating characteristic curve remained highest for Random Forest as compared to other tree-based models. Also, the confusion matrix for Random Forest showed least false positives, and highest true positives hence could be construed as the most robust model for predicting the insurance uptake. Finally, the most important feature in predicting uptake was having a bank product hence bancassurance could be said to be a plausible channel of distribution of insurance products.


Author(s):  
Munder Abdulatef Al-Hashem ◽  
Ali Mohammad Alqudah ◽  
Qasem Qananwah

Knowledge extraction within a healthcare field is a very challenging task since we are having many problems such as noise and imbalanced datasets. They are obtained from clinical studies where uncertainty and variability are popular. Lately, a wide number of machine learning algorithms are considered and evaluated to check their validity of being used in the medical field. Usually, the classification algorithms are compared against medical experts who are specialized in certain disease diagnoses and provide an effective methodological evaluation of classifiers by applying performance metrics. The performance metrics contain four criteria: accuracy, sensitivity, and specificity forming the confusion matrix of each used algorithm. We have utilized eight different well-known machine learning algorithms to evaluate their performances in six different medical datasets. Based on the experimental results we conclude that the XGBoost and K-Nearest Neighbor classifiers were the best overall among the used datasets and signs can be used for diagnosing various diseases.


2020 ◽  
Vol 21 (21) ◽  
pp. 8004
Author(s):  
Yu Sakai ◽  
Chen Yang ◽  
Shingo Kihira ◽  
Nadejda Tsankova ◽  
Fahad Khan ◽  
...  

In patients with gliomas, isocitrate dehydrogenase 1 (IDH1) mutation status has been studied as a prognostic indicator. Recent advances in machine learning (ML) have demonstrated promise in utilizing radiomic features to study disease processes in the brain. We investigate whether ML analysis of multiparametric radiomic features from preoperative Magnetic Resonance Imaging (MRI) can predict IDH1 mutation status in patients with glioma. This retrospective study included patients with glioma with known IDH1 status and preoperative MRI. Radiomic features were extracted from Fluid-Attenuated Inversion Recovery (FLAIR) and Diffusion-Weighted-Imaging (DWI). The dataset was split into training, validation, and testing sets by stratified sampling. Synthetic Minority Oversampling Technique (SMOTE) was applied to the training sets. eXtreme Gradient Boosting (XGBoost) classifiers were trained, and the hyperparameters were tuned. Receiver operating characteristic curve (ROC), accuracy, and f1-scores were collected. A total of 100 patients (age: 55 ± 15, M/F 60/40); with IDH1 mutant (n = 22) and IDH1 wildtype (n = 78) were included. The best performance was seen with a DWI-trained XGBoost model, which achieved ROC with Area Under the Curve (AUC) of 0.97, accuracy of 0.90, and f1-score of 0.75 on the test set. The FLAIR-trained XGBoost model achieved ROC with AUC of 0.95, accuracy of 0.90, f1-score of 0.75 on the test set. A model that was trained on combined FLAIR-DWI radiomic features did not provide incremental accuracy. The results show that a XGBoost classifier using multiparametric radiomic features derived from preoperative MRI can predict IDH1 mutation status with > 90% accuracy.


2018 ◽  
Vol 2018 ◽  
pp. 1-12 ◽  
Author(s):  
Zhi-yu Luo ◽  
Ji Cui ◽  
Xiao-juan Hu ◽  
Li-ping Tu ◽  
Hai-dan Liu ◽  
...  

Objective. In this study, machine learning was utilized to classify and predict pulse wave of hypertensive group and healthy group and assess the risk of hypertension by observing the dynamic change of the pulse wave and provide an objective reference for clinical application of pulse diagnosis in traditional Chinese medicine (TCM). Method. The basic information from 450 hypertensive cases and 479 healthy cases was collected by self-developed H20 questionnaires and pulse wave information was acquired by self-developed pulse diagnostic instrument (PDA-1). H20 questionnaires and pulse wave information were used as input variables to obtain different machine learning classification models of hypertension. This method was aimed at analyzing the influence of pulse wave on the accuracy and stability of machine learning model, as well as the feature contribution of hypertension model after removing noise by K-means. Result. Compared with the classification results before removing noise, the accuracy and the area under the curve (AUC) had been improved. The accuracy rates of AdaBoost, Gradient Boosting, and Random Forest (RF) were 86.41%, 86.41%, and 85.33%, respectively. AUC were 0.86, 0.86, and 0.85, respectively. The maximum accuracy of SVM increased from 79.57% to 83.15%, and the AUC stability increased from 0.79 to 0.83. In addition, the features of importance on traditional statistics and machine learning were consistent. After removing noise, the features with large changes were h1/t1, w1/t, t, w2, h2, t1, and t5 in AdaBoost and Gradient Boosting (top10). The common variables for machine learning and traditional statistics were h1/t1, h5, t, Ad, BMI, and t2. Conclusion. Pulse wave-based diagnostic method of hypertension has significant value in reference. In view of the feasibility of digital-pulse-wave diagnosis and dynamically evaluating hypertension, it provides the research direction and foundation for Chinese medicine in the dynamic evaluation of modern disease diagnosis and curative effect.


Author(s):  
Rupali Amit Bagate ◽  
R. Suguna

Identifying sarcasm present in the text could be a challenging work. In sarcasm, a negative word can flip the polarity of a positive sentence. Sentences can be classified as sarcastic or non-sarcastic. It is easier to identify sarcasm using facial expression or tonal weight rather detecting from plain text. Thus, sarcasm detection using natural language processing is major challenge without giving away any specific context or clue such as #sarcasm present in a tweet. Therefore, research tries to solve this classification problem using various optimized models. Proposed model, analyzes whether a given tweet, is sarcastic or not without the presnece of hashtag sarcasm or any kind of specific context present in text. To achieve better results, we used different machine learning classification methodology along with deep learning embedding techniques. Our optimized model uses a stacking technique which combines the result of logistic regression and long short-term memory (LSTM) recurrent neural net feed to light gradient boosting technique which generates better result as compare to existing machine learning and neural network algorithm. The key difference of our research work is sarcasm detection done without #sarcasm which has not been much explored earlier by any researcher. The metrics used for evolutionis F1-score and confusion matrix.


Knowledge extraction within a healthcare field is a very challenging task since we are having many problems such as noise and imbalanced datasets. They are obtained from clinical studies where uncertainty and variability are popular. Lately, a wide number of machine learning algorithms are considered and evaluated to check their validity of being used in the medical field. Usually, the classification algorithms are compared against medical experts who are specialized in certain disease diagnoses and provide an effective methodological evaluation of classifiers by applying performance metrics. The performance metrics contain four criteria: accuracy, sensitivity, and specificity forming the confusion matrix of each used algorithm. We have utilized eight different well-known machine learning algorithms to evaluate their performances in six different medical datasets. Based on the experimental results we conclude that the XGBoost and K-Nearest Neighbor classifiers were the best overall among the used datasets and signs can be used for diagnosing various diseases.


2020 ◽  
Author(s):  
Prasannavenkatesan Theerthagiri ◽  
I.Jeena Jacob ◽  
A.Usha Ruby ◽  
Y.Vamsidhar

Abstract This paper studies the different machine learning classification algorithms to predict the COVID-19 recovered and deceased cases. The k-fold cross-validation resampling technique is used to validate the prediction model. The prediction scores of each algorithm are evaluated with performance metrics such as prediction accuracy, precision, recall, mean square error, confusion matrix, and kappa score. For the given dataset, the k-nearest neighbour (KNN) classification algorithm produces 80.4 % of predication accuracy and 1.5 to 3.3 % of improved accuracy over other algorithms. The KNN algorithm predicts 92 % (true positive rate) of the deceased cases correctly with 0.077 % of misclassification. Further, the KNN algorithm produces the lowest error rate as 0.19 on the prediction of accurate COVID-19 cases than the other algorithm. Also, it produces the receiver operator characteristic curve with the output value of 82 %.


Foods ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 550
Author(s):  
Liyang Wang ◽  
Dantong Niu ◽  
Xiaoya Wang ◽  
Jabir Khan ◽  
Qun Shen ◽  
...  

Strategies to screen antihypertensive peptides with high throughput and rapid speed will doubtlessly contribute to the treatment of hypertension. Food-derived antihypertensive peptides can reduce blood pressure without side effects. In the present study, a novel model based on the eXtreme Gradient Boosting (XGBoost) algorithm was developed and compared with the dominating machine learning models. To further reflect on the reliability of the method in a real situation, the optimized XGBoost model was utilized to predict the antihypertensive degree of the k-mer peptides cutting from six key proteins in bovine milk, and the peptide–protein docking technology was introduced to verify the findings. The results showed that the XGBoost model achieved outstanding performance, with an accuracy of 86.50% and area under the receiver operating characteristic curve of 94.11%, which were better than the other models. Using the XGBoost model, the prediction of antihypertensive peptides derived from milk protein was consistent with the peptide–protein docking results, and was more efficient. Our results indicate that using the XGBoost algorithm as a novel auxiliary tool is feasible to screen for antihypertensive peptides derived from food, with high throughput and high efficiency.


Sign in / Sign up

Export Citation Format

Share Document