Machine Learning Models Have Better Performance than Traditional Logistic Regression Models in Predicting the Risk of Diabetes

2021 ◽  
Author(s):  
Yaqian Mao ◽  
Shuyao Pan ◽  
Zheng Zhu ◽  
Wei Lin ◽  
Junping Wen ◽  
...  
2020 ◽  
Vol 42 (3) ◽  
pp. 135-147 ◽  
Author(s):  
Michael Behr ◽  
Saba Saiel ◽  
Valerie Evans ◽  
Dinesh Kumbhare

Fibromyalgia (FM) diagnosis remains a challenge for clinicians due to a lack of objective diagnostic tools. One proposed solution is the use of quantitative ultrasound (US) techniques, such as image texture analysis, which has demonstrated discriminatory capabilities with other chronic pain conditions. From this, we propose the use of image texture variables to construct and compare two machine learning models (support vector machine [SVM] and logistic regression) for differentiating between the trapezius muscle in healthy and FM patients. US videos of the right and left trapezius muscle were acquired from healthy ( n = 51) participants and those with FM ( n = 57). The videos were converted into 64,800 skeletal muscle regions of interest (ROIs) using MATLAB. The ROIs were filtered by an algorithm using the complex wavelet structural similarity index (CW-SSIM), which removed ROIs that were similar. Thirty-one texture variables were extracted from the ROIs, which were then used in nested cross-validation to construct SVM and elastic net regularized logistic regression models. The generalized performance accuracy of both models was estimated and confirmed with a final validation on a holdout test set. The predicted generalized performance accuracy of the SVM and logistic regression models was computed to be 83.9 ± 2.6% and 65.8 ± 1.7%, respectively. The models achieved accuracies of 84.1%, and 66.0% on the final holdout test set, validating performance estimates. Although both machine learning models differentiate between healthy trapezius muscle and that of patients with FM, only the SVM model demonstrated clinically relevant performance levels.


2021 ◽  
Vol 6 ◽  
pp. 248
Author(s):  
Paul Mwaniki ◽  
Timothy Kamanu ◽  
Samuel Akech ◽  
Dustin Dunsmuir ◽  
J. Mark Ansermino ◽  
...  

Background: The success of many machine learning applications depends on knowledge about the relationship between the input data and the task of interest (output), hindering the application of machine learning to novel tasks. End-to-end deep learning, which does not require intermediate feature engineering, has been recommended to overcome this challenge but end-to-end deep learning models require large labelled training data sets often unavailable in many medical applications. In this study, we trained machine learning models to predict paediatric hospitalization given raw photoplethysmography (PPG) signals obtained from a pulse oximeter. We trained self-supervised learning (SSL) for automatic feature extraction from PPG signals and assessed the utility of SSL in initializing end-to-end deep learning models trained on a small labelled data set with the aim of predicting paediatric hospitalization.Methods: We compared logistic regression models fitted using features extracted using SSL with end-to-end deep learning models initialized either randomly or using weights from the SSL model. We also compared the performance of SSL models trained on labelled data alone (n=1,031) with SSL trained using both labelled and unlabelled signals (n=7,578). Results: The SSL model trained on both labelled and unlabelled PPG signals produced features that were more predictive of hospitalization compared to the SSL model trained on labelled PPG only (AUC of logistic regression model: 0.78 vs 0.74). The end-to-end deep learning model had an AUC of 0.80 when initialized using the SSL model trained on all PPG signals, 0.77 when initialized using SSL trained on labelled data only, and 0.73 when initialized randomly. Conclusions: This study shows that SSL can improve the classification of PPG signals by either extracting features required by logistic regression models or initializing end-to-end deep learning models. Furthermore, SSL can leverage larger unlabelled data sets to improve performance of models fitted using small labelled data sets.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Martine De Cock ◽  
Rafael Dowsley ◽  
Anderson C. A. Nascimento ◽  
Davis Railsback ◽  
Jianwei Shen ◽  
...  

Abstract Background In biomedical applications, valuable data is often split between owners who cannot openly share the data because of privacy regulations and concerns. Training machine learning models on the joint data without violating privacy is a major technology challenge that can be addressed by combining techniques from machine learning and cryptography. When collaboratively training machine learning models with the cryptographic technique named secure multi-party computation, the price paid for keeping the data of the owners private is an increase in computational cost and runtime. A careful choice of machine learning techniques, algorithmic and implementation optimizations are a necessity to enable practical secure machine learning over distributed data sets. Such optimizations can be tailored to the kind of data and Machine Learning problem at hand. Methods Our setup involves secure two-party computation protocols, along with a trusted initializer that distributes correlated randomness to the two computing parties. We use a gradient descent based algorithm for training a logistic regression like model with a clipped ReLu activation function, and we break down the algorithm into corresponding cryptographic protocols. Our main contributions are a new protocol for computing the activation function that requires neither secure comparison protocols nor Yao’s garbled circuits, and a series of cryptographic engineering optimizations to improve the performance. Results For our largest gene expression data set, we train a model that requires over 7 billion secure multiplications; the training completes in about 26.90 s in a local area network. The implementation in this work is a further optimized version of the implementation with which we won first place in Track 4 of the iDASH 2019 secure genome analysis competition. Conclusions In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure multi-party computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.


2021 ◽  
Vol 42 (Supplement_1) ◽  
pp. S33-S34
Author(s):  
Morgan A Taylor ◽  
Randy D Kearns ◽  
Jeffrey E Carter ◽  
Mark H Ebell ◽  
Curt A Harris

Abstract Introduction A nuclear disaster would generate an unprecedented volume of thermal burn patients from the explosion and subsequent mass fires (Figure 1). Prediction models characterizing outcomes for these patients may better equip healthcare providers and other responders to manage large scale nuclear events. Logistic regression models have traditionally been employed to develop prediction scores for mortality of all burn patients. However, other healthcare disciplines have increasingly transitioned to machine learning (ML) models, which are automatically generated and continually improved, potentially increasing predictive accuracy. Preliminary research suggests ML models can predict burn patient mortality more accurately than commonly used prediction scores. The purpose of this study is to examine the efficacy of various ML methods in assessing thermal burn patient mortality and length of stay in burn centers. Methods This retrospective study identified patients with fire/flame burn etiologies in the National Burn Repository between the years 2009 – 2018. Patients were randomly partitioned into a 67%/33% split for training and validation. A random forest model (RF) and an artificial neural network (ANN) were then constructed for each outcome, mortality and length of stay. These models were then compared to logistic regression models and previously developed prediction tools with similar outcomes using a combination of classification and regression metrics. Results During the study period, 82,404 burn patients with a thermal etiology were identified in the analysis. The ANN models will likely tend to overfit the data, which can be resolved by ending the model training early or adding additional regularization parameters. Further exploration of the advantages and limitations of these models is forthcoming as metric analyses become available. Conclusions In this proof-of-concept study, we anticipate that at least one ML model will predict the targeted outcomes of thermal burn patient mortality and length of stay as judged by the fidelity with which it matches the logistic regression analysis. These advancements can then help disaster preparedness programs consider resource limitations during catastrophic incidents resulting in burn injuries.


2021 ◽  
Vol 10 (1) ◽  
pp. 99
Author(s):  
Sajad Yousefi

Introduction: Heart disease is often associated with conditions such as clogged arteries due to the sediment accumulation which causes chest pain and heart attack. Many people die due to the heart disease annually. Most countries have a shortage of cardiovascular specialists and thus, a significant percentage of misdiagnosis occurs. Hence, predicting this disease is a serious issue. Using machine learning models performed on multidimensional dataset, this article aims to find the most efficient and accurate machine learning models for disease prediction.Material and Methods: Several algorithms were utilized to predict heart disease among which Decision Tree, Random Forest and KNN supervised machine learning are highly mentioned. The algorithms are applied to the dataset taken from the UCI repository including 294 samples. The dataset includes heart disease features. To enhance the algorithm performance, these features are analyzed, the feature importance scores and cross validation are considered.Results: The algorithm performance is compared with each other, so that performance based on ROC curve and some criteria such as accuracy, precision, sensitivity and F1 score were evaluated for each model. As a result of evaluation, Accuracy, AUC ROC are 83% and 99% respectively for Decision Tree algorithm. Logistic Regression algorithm with accuracy and AUC ROC are 88% and 91% respectively has better performance than other algorithms. Therefore, these techniques can be useful for physicians to predict heart disease patients and prescribe them correctly.Conclusion: Machine learning technique can be used in medicine for analyzing the related data collections to a disease and its prediction. The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate heart disease and indeed, the prediction of heart disease is compared to determine the most appropriate classification. As a result of evaluation, better performance was observed in both Decision Tree and Logistic Regression models.


Sensors ◽  
2020 ◽  
Vol 20 (21) ◽  
pp. 6019
Author(s):  
José Manuel Lozano Domínguez ◽  
Faroq Al-Tam ◽  
Tomás de J. Mateo Sanguino ◽  
Noélia Correia

Improving road safety through artificial intelligence-based systems is now crucial turning smart cities into a reality. Under this highly relevant and extensive heading, an approach is proposed to improve vehicle detection in smart crosswalks using machine learning models. Contrarily to classic fuzzy classifiers, machine learning models do not require the readjustment of labels that depend on the location of the system and the road conditions. Several machine learning models were trained and tested using real traffic data taken from urban scenarios in both Portugal and Spain. These include random forest, time-series forecasting, multi-layer perceptron, support vector machine, and logistic regression models. A deep reinforcement learning agent, based on a state-of-the-art double-deep recurrent Q-network, is also designed and compared with the machine learning models just mentioned. Results show that the machine learning models can efficiently replace the classic fuzzy classifier.


mSystems ◽  
2019 ◽  
Vol 4 (4) ◽  
Author(s):  
Finlay Maguire ◽  
Muhammad Attiq Rehman ◽  
Catherine Carrillo ◽  
Moussa S. Diarra ◽  
Robert G. Beiko

ABSTRACT Nontyphoidal Salmonella (NTS) is a leading global cause of bacterial foodborne morbidity and mortality. Our ability to treat severe NTS infections has been impaired by increasing antimicrobial resistance (AMR). To understand and mitigate the global health crisis AMR represents, we need to link the observed resistance phenotypes with their underlying genomic mechanisms. Broiler chickens represent a key reservoir and vector for NTS infections, but isolates from this setting have been characterized in only very low numbers relative to clinical isolates. In this study, we sequenced and assembled 97 genomes encompassing 7 serotypes isolated from broiler chicken in farms in British Columbia between 2005 and 2008. Through application of machine learning (ML) models to predict the observed AMR phenotype from this genomic data, we were able to generate highly (0.92 to 0.99) precise logistic regression models using known AMR gene annotations as features for 7 antibiotics (amoxicillin-clavulanic acid, ampicillin, cefoxitin, ceftiofur, ceftriaxone, streptomycin, and tetracycline). Similarly, we also trained “reference-free” k-mer-based set-covering machine phenotypic prediction models (0.91 to 1.0 precision) for these antibiotics. By combining the inferred k-mers and logistic regression weights, we identified the primary drivers of AMR for the 7 studied antibiotics in these isolates. With our research representing one of the largest studies of a diverse set of NTS isolates from broiler chicken, we can thus confirm that the AmpC-like CMY-2 β-lactamase is a primary driver of β-lactam resistance and that the phosphotransferases APH(6)-Id and APH(3″-Ib) are the principal drivers of streptomycin resistance in this important ecosystem. IMPORTANCE Antimicrobial resistance (AMR) represents an existential threat to the function of modern medicine. Genomics and machine learning methods are being increasingly used to analyze and predict AMR. This type of surveillance is very important to try to reduce the impact of AMR. Machine learning models are typically trained using genomic data, but the aspects of the genomes that they use to make predictions are rarely analyzed. In this work, we showed how, by using different types of machine learning models and performing this analysis, it is possible to identify the key genes underlying AMR in nontyphoidal Salmonella (NTS). NTS is among the leading cause of foodborne illness globally; however, AMR in NTS has not been heavily studied within the food chain itself. Therefore, in this work we performed a broad-scale analysis of the AMR in NTS isolates from commercial chicken farms and identified some priority AMR genes for surveillance.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Seulkee Lee ◽  
Yeonghee Eun ◽  
Hyungjin Kim ◽  
Hoon-Suk Cha ◽  
Eun-Mi Koh ◽  
...  

AbstractWe aim to generate an artificial neural network (ANN) model to predict early TNF inhibitor users in patients with ankylosing spondylitis. The baseline demographic and laboratory data of patients who visited Samsung Medical Center rheumatology clinic from Dec. 2003 to Sep. 2018 were analyzed. Patients were divided into two groups: early-TNF and non-early-TNF users. Machine learning models were formulated to predict the early-TNF users using the baseline data. Feature importance analysis was performed to delineate significant baseline characteristics. The numbers of early-TNF and non-early-TNF users were 90 and 505, respectively. The performance of the ANN model, based on the area under curve (AUC) for a receiver operating characteristic curve (ROC) of 0.783, was superior to logistic regression, support vector machine, random forest, and XGBoost models (for an ROC curve of 0.719, 0.699, 0.761, and 0.713, respectively) in predicting early-TNF users. Feature importance analysis revealed CRP and ESR as the top significant baseline characteristics for predicting early-TNF users. Our model displayed superior performance in predicting early-TNF users compared with logistic regression and other machine learning models. Machine learning can be a vital tool in predicting treatment response in various rheumatologic diseases.


BMJ Open ◽  
2020 ◽  
Vol 10 (10) ◽  
pp. e040132
Author(s):  
Innocent B Mboya ◽  
Michael J Mahande ◽  
Mohanad Mohammed ◽  
Joseph Obure ◽  
Henry G Mwambi

ObjectiveWe aimed to determine the key predictors of perinatal deaths using machine learning models compared with the logistic regression model.DesignA secondary data analysis using the Kilimanjaro Christian Medical Centre (KCMC) Medical Birth Registry cohort from 2000 to 2015. We assessed the discriminative ability of models using the area under the receiver operating characteristics curve (AUC) and the net benefit using decision curve analysis.SettingThe KCMC is a zonal referral hospital located in Moshi Municipality, Kilimanjaro region, Northern Tanzania. The Medical Birth Registry is within the hospital grounds at the Reproductive and Child Health Centre.ParticipantsSingleton deliveries (n=42 319) with complete records from 2000 to 2015.Primary outcome measuresPerinatal death (composite of stillbirths and early neonatal deaths). These outcomes were only captured before mothers were discharged from the hospital.ResultsThe proportion of perinatal deaths was 3.7%. There were no statistically significant differences in the predictive performance of four machine learning models except for bagging, which had a significantly lower performance (AUC 0.76, 95% CI 0.74 to 0.79, p=0.006) compared with the logistic regression model (AUC 0.78, 95% CI 0.76 to 0.81). However, in the decision curve analysis, the machine learning models had a higher net benefit (ie, the correct classification of perinatal deaths considering a trade-off between false-negatives and false-positives)—over the logistic regression model across a range of threshold probability values.ConclusionsIn this cohort, there was no significant difference in the prediction of perinatal deaths between machine learning and logistic regression models, except for bagging. The machine learning models had a higher net benefit, as its predictive ability of perinatal death was considerably superior over the logistic regression model. The machine learning models, as demonstrated by our study, can be used to improve the prediction of perinatal deaths and triage for women at risk.


Sign in / Sign up

Export Citation Format

Share Document