Metabolic Syndrome Prediction Models Using Machine Learning and Sasang Constitution Type

Evidence-based Complementary and Alternative Medicine ◽

10.1155/2021/8315047 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Ji-Eun Park ◽

Sujeong Mun ◽

Siwoo Lee

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Prediction Models ◽

Support Vector ◽

K Nearest Neighbor ◽

Learning Methods ◽

Machine Learning Methods ◽

Sasang Constitution ◽

Constitution Type ◽

Conventional Regression

Background. Machine learning may be a useful tool for predicting metabolic syndrome (MetS), and previous studies also suggest that the risk of MetS differs according to Sasang constitution type. The present study investigated the development of MetS prediction models utilizing machine learning methods and whether the incorporation of Sasang constitution type could improve the performance of those prediction models. Methods. Participants visiting a medical center for a health check-up were recruited in 2005 and 2006. Six kinds of machine learning were utilized (K-nearest neighbor, naive Bayes, random forest, decision tree, multilayer perceptron, and support vector machine), as was conventional logistic regression. Machine learning-derived MetS prediction models with and without the incorporation of Sasang constitution type were compared to investigate whether the former would predict MetS with higher sensitivity. Age, sex, education level, marital status, body mass index, stress, physical activity, alcohol consumption, and smoking were included as potentially predictive factors. Results. A total of 750/2,871 participants had MetS. Among the six types of machine learning methods investigated, multiplayer perceptron and support vector machine exhibited the same performance as the conventional regression method, based on the areas under the receiver operating characteristic curves. The naive-Bayes method exhibited the highest sensitivity (0.49), which was higher than that of the conventional regression method (0.39). The incorporation of Sasang constitution type improved the sensitivity of all of the machine learning methods investigated except for the K-nearest neighbor method. Conclusion. Machine learning-derived models may be useful for MetS prediction, and the incorporation of Sasang constitution type may increase the sensitivity of such models.

Download Full-text

Studi Komparasi Metode Machine Learning untuk Klasifikasi Citra Huruf Vokal Hiragana

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i3.3083 ◽

2021 ◽

Vol 5 (3) ◽

pp. 905

Author(s):

Muhammad Afrizal Amrustian ◽

Vika Febri Muliati ◽

Elsa Elvira Awal

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Image Classification ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Learning Methods ◽

Machine Learning Methods ◽

The Comparative Study

Japanese is one of the most difficult languages to understand and read. Japanese writing that does not use the alphabet is the reason for the difficulty of the Japanese language to read. There are three types of Japanese, namely kanji, katakana, and hiragana. Hiragana letters are the most commonly used type of writing. In addition, hiragana has a cursive nature, so each person's writing will be different. Machine learning methods can be used to read Japanese letters by recognizing the image of the letters. The Japanese letters that are used in this study are hiragana vowels. This study focuses on conducting a comparative study of machine learning methods for the image classification of Japanese letters. The machine learning methods that were successfully compared are Naïve Bayes, Support Vector Machine, Decision Tree, Random Forest, and K-Nearest Neighbor. The results of the comparative study show that the K-Nearest Neighbor method is the best method for image classification of hiragana vowels. K-Nearest Neighbor gets an accuracy of 89.4% with a low error rate.

Download Full-text

The Tomatoes and Chilies Type Classifications by Using Machine Learning Methods

Journal of Development Research ◽

10.28926/jdr.v4i1.93 ◽

2020 ◽

Vol 4 (1) ◽

pp. 1-6

Author(s):

Irzal Ahmad Sabilla ◽

Chastine Fatichah

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Support Vector ◽

Staple Food ◽

K Nearest Neighbor ◽

Learning Methods ◽

Linear Discriminant ◽

Machine Learning Methods

Vegetables are ingredients for flavoring, such as tomatoes and chilies. A Both of these ingredients are processed to accompany the people's staple food in the form of sauce and seasoning. In supermarkets, these vegetables can be found easily, but many people do not understand how to choose the type and quality of chilies and tomatoes. This study discusses the classification of types of cayenne, curly, green, red chilies, and tomatoes with good and bad conditions using machine learning and contrast enhancement techniques. The machine learning methods used are Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), Linear Discriminant Analysis (LDA), and Random Forest (RF). The results of testing the best method are measured based on the value of accuracy. In addition to the accuracy of this study, it also measures the speed of computation so that the methods used are efficient.

Download Full-text

Machine Learning Methods Applied to the Prediction of Pseudo-nitzschia spp. Blooms in the Galician Rias Baixas (NW Spain)

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10040199 ◽

2021 ◽

Vol 10 (4) ◽

pp. 199

Author(s):

Francisco M. Bellas Aláez ◽

Jesus M. Torres Palenzuela ◽

Evangelos Spyrakos ◽

Luis González Vilas

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Prediction Models ◽

Support Vector ◽

False Alarms ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Rías Baixas ◽

New Algorithms

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.

Download Full-text

Classification models using circulating neutrophil transcripts can detect unruptured intracranial aneurysm

Journal of Translational Medicine ◽

10.1186/s12967-020-02550-2 ◽

2020 ◽

Vol 18 (1) ◽

Author(s):

Kerry E. Poppenberg ◽

Vincent M. Tutino ◽

Lu Li ◽

Muhammad Waqas ◽

Armond June ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Models ◽

Model Performance ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Training Cohort ◽

Network Analyses ◽

Machine Learning Methods

Abstract Background Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.

Download Full-text

A Systematic Methodology to Evaluate Prediction Models for Driving Style Classification

Sensors ◽

10.3390/s20061692 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1692 ◽

Cited By ~ 6

Author(s):

Iván Silva ◽

José Eugenio Naranjo

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Performance Metrics ◽

Prediction Models ◽

Statistical Tests ◽

Area Under The Curve ◽

The Other ◽

Support Vector ◽

Classification Models ◽

K Nearest Neighbor

Identifying driving styles using classification models with in-vehicle data can provide automated feedback to drivers on their driving behavior, particularly if they are driving safely. Although several classification models have been developed for this purpose, there is no consensus on which classifier performs better at identifying driving styles. Therefore, more research is needed to evaluate classification models by comparing performance metrics. In this paper, a data-driven machine-learning methodology for classifying driving styles is introduced. This methodology is grounded in well-established machine-learning (ML) methods and literature related to driving-styles research. The methodology is illustrated through a study involving data collected from 50 drivers from two different cities in a naturalistic setting. Five features were extracted from the raw data. Fifteen experts were involved in the data labeling to derive the ground truth of the dataset. The dataset fed five different models (Support Vector Machines (SVM), Artificial Neural Networks (ANN), fuzzy logic, k-Nearest Neighbor (kNN), and Random Forests (RF)). These models were evaluated in terms of a set of performance metrics and statistical tests. The experimental results from performance metrics showed that SVM outperformed the other four models, achieving an average accuracy of 0.96, F1-Score of 0.9595, Area Under the Curve (AUC) of 0.9730, and Kappa of 0.9375. In addition, Wilcoxon tests indicated that ANN predicts differently to the other four models. These promising results demonstrate that the proposed methodology may support researchers in making informed decisions about which ML model performs better for driving-styles classification.

Download Full-text

Development of a Probabilistic Seismic Performance Assessment Model of Slope Using Machine Learning Methods

Sustainability ◽

10.3390/su12083269 ◽

2020 ◽

Vol 12 (8) ◽

pp. 3269

Author(s):

Shinyoung Kwag ◽

Daegi Hahm ◽

Minkyu Kim ◽

Seunghyun Eem

Keyword(s):

Machine Learning ◽

Regression Analysis ◽

Linear Regression ◽

Seismic Performance ◽

Prediction Models ◽

Linear Regression Analysis ◽

Assessment Model ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

The objective of this study is to propose a model that can predict the seismic performance of slope relatively accurately and efficiently by using machine learning methods. Probabilistic seismic fragility analyses of the slope had been carried out in other studies, and a closed-form equation for slope seismic performance was proposed through a multiple linear regression analysis. However, the traditional statistical linear regression analysis showed a limit that could not accurately represent such nonlinear slope seismic performances. To overcome this limit, in this study, we used three machine learning methods (i.e., support vector machine (SVM), artificial neural network (ANN), Gaussian process regression (GPR)) to generate prediction models of the slope seismic performance. The models obtained through the machine learning methods basically showed better performance compared to the models of the traditional statistical methods. The results of the SVM showed no significant performance difference compared with the results of the nonlinear regression analysis method, but the results based on the ANN and GPR showed a remarkable improvement in the prediction performance over the other models. Furthermore, this study confirmed that the GPR-based model predicted relatively accurate seismic performance values compared with the model through the ANN.

Download Full-text

Predicting Fine Particulate Matter (PM2.5) in the Greater London Area: An Ensemble Approach using Machine Learning Methods

Remote Sensing ◽

10.3390/rs12060914 ◽

2020 ◽

Vol 12 (6) ◽

pp. 914 ◽

Cited By ~ 4

Author(s):

Mahdieh Danesh Yazdi ◽

Zheng Kuang ◽

Konstantina Dimakopoulou ◽

Benjamin Barratt ◽

Esra Suel ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbor ◽

Meteorological Data ◽

Fine Particulate Matter ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Learning Methods ◽

Machine Learning Methods ◽

Technological Advances

Estimating air pollution exposure has long been a challenge for environmental health researchers. Technological advances and novel machine learning methods have allowed us to increase the geographic range and accuracy of exposure models, making them a valuable tool in conducting health studies and identifying hotspots of pollution. Here, we have created a prediction model for daily PM2.5 levels in the Greater London area from 1st January 2005 to 31st December 2013 using an ensemble machine learning approach incorporating satellite aerosol optical depth (AOD), land use, and meteorological data. The predictions were made on a 1 km × 1 km scale over 3960 grid cells. The ensemble included predictions from three different machine learners: a random forest (RF), a gradient boosting machine (GBM), and a k-nearest neighbor (KNN) approach. Our ensemble model performed very well, with a ten-fold cross-validated R2 of 0.828. Of the three machine learners, the random forest outperformed the GBM and KNN. Our model was particularly adept at predicting day-to-day changes in PM2.5 levels with an out-of-sample temporal R2 of 0.882. However, its ability to predict spatial variability was weaker, with a R2 of 0.396. We believe this to be due to the smaller spatial variation in pollutant levels in this area.

Download Full-text

Predicting Learning Outcomes with MOOC Clickstreams

Education Sciences ◽

10.3390/educsci9020104 ◽

2019 ◽

Vol 9 (2) ◽

pp. 104 ◽

Cited By ~ 5

Author(s):

Chen-Hsiang Yu ◽

Jungpin Wu ◽

An-Chi Liu

Keyword(s):

Machine Learning ◽

Learning Outcomes ◽

Prediction Accuracy ◽

Nearest Neighbor ◽

Prediction Models ◽

Video Data ◽

Support Vector ◽

Completion Rates ◽

K Nearest Neighbor ◽

Learning Behaviors

Massive Open Online Courses (MOOCs) have gradually become a dominant trend in education. Since 2014, the Ministry of Education in Taiwan has been promoting MOOC programs, with successful results. The ability of students to work at their own pace, however, is associated with low MOOC completion rates and has recently become a focus. The development of a mechanism to effectively improve course completion rates continues to be of great interest to both teachers and researchers. This study established a series of learning behaviors using the video clickstream records of students, through a MOOC platform, to identify seven types of cognitive participation models of learners. We subsequently built practical machine learning models by using K-nearest neighbor (KNN), support vector machines (SVM), and artificial neural network (ANN) algorithms to predict students’ learning outcomes via their learning behaviors. The ANN machine learning method had the highest prediction accuracy. Based on the prediction results, we saw a correlation between video viewing behavior and learning outcomes. This could allow teachers to help students needing extra support successfully pass the course. To further improve our method, we classified the course videos based on their content. There were three video categories: theoretical, experimental, and analytic. Different prediction models were built for each of these three video types and their combinations. We performed the accuracy verification; our experimental results showed that we could use only theoretical and experimental video data, instead of all three types of data, to generate prediction models without significant differences in prediction accuracy. In addition to data reduction in model generation, this could help teachers evaluate the effectiveness of course videos.

Download Full-text

Machine Learning Methods Applied to Predict Ventilator-Associated Pneumonia with Pseudomonas aeruginosa Infection via Sensor Array of Electronic Nose in Intensive Care Unit

Sensors ◽

10.3390/s19081866 ◽

2019 ◽

Vol 19 (8) ◽

pp. 1866 ◽

Cited By ~ 13

Author(s):

Liao ◽

Wang ◽

Zhang ◽

Abbod ◽

Shih ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care Unit ◽

Pseudomonas Aeruginosa ◽

Intensive Care ◽

Prediction Models ◽

Support Vector ◽

Ventilator Associated Pneumonia ◽

Learning Methods ◽

Machine Learning Methods ◽

Pseudomonas Aeruginosa Infection

One concern to the patients is the off-line detection of pneumonia infection status after using the ventilator in the intensive care unit. Hence, machine learning methods for ventilator-associated pneumonia (VAP) rapid diagnose are proposed. A popular device, Cyranose 320 e-nose, is usually used in research on lung disease, which is a highly integrated system and sensor comprising 32 array using polymer and carbon black materials. In this study, a total of 24 subjects were involved, including 12 subjects who are infected with pneumonia, and the rest are non-infected. Three layers of back propagation artificial neural network and support vector machine (SVM) methods were applied to patients’ data to predict whether they are infected with VAP with Pseudomonas aeruginosa infection. Furthermore, in order to improve the accuracy and the generalization of the prediction models, the ensemble neural networks (ENN) method was applied. In this study, ENN and SVM prediction models were trained and tested. In order to evaluate the models’ performance, a fivefold cross-validation method was applied. The results showed that both ENN and SVM models have high recognition rates of VAP with Pseudomonas aeruginosa infection, with 0.9479 ± 0.0135 and 0.8686 ± 0.0422 accuracies, 0.9714 ± 0.0131, 0.9250 ± 0.0423 sensitivities, and 0.9288 ± 0.0306, 0.8639 ± 0.0276 positive predictive values, respectively. The ENN model showed better performance compared to SVM in the recognition of VAP with Pseudomonas aeruginosa infection. The areas under the receiver operating characteristic curve of the two models were 0.9842 ± 0.0058 and 0.9410 ± 0.0301, respectively, showing that both models are very stable and accurate classifiers. This study aims to assist the physician in providing a scientific and effective reference for performing early detection in Pseudomonas aeruginosa infection or other diseases.

Download Full-text

Supervised machine learning methods in psychology: A practical introduction with annotated R code

10.31234/osf.io/s72vu ◽

2019 ◽

Author(s):

Hannes Rosenbusch ◽

Felix Soldner ◽

Anthony M Evans ◽

Marcel Zeelenberg

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Psychological Research ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Comprehensive Overview ◽

K Nearest Neighbors ◽

Machine Learning Methods ◽

Out Of Sample

Machine learning methods for pattern detection and prediction are increasingly prevalent in psychological research. We provide a comprehensive overview of machine learning, its applications, and how to implement models for research. We review fundamental concepts of machine learning, such as prediction accuracy and out-of-sample evaluation, and summarize four standard prediction algorithms: linear regressions, ridge regressions, decision trees, and random forests (plus k-nearest neighbors, Naïve Bayes classifiers, and support vector machines in the supplementary material). This selection provides a set of powerful models that are implemented regularly in machine learning projects. We demonstrate each method with examples and annotated R code, and discuss best practices for determining sample sizes; comparing model performances; tuning prediction models; preregistering prediction models; and reporting results. Finally, we discuss the value of machine learning methods in maintaining psychology’s status as a predictive science.

Download Full-text