Predicting the risk of inpatient hypoglycemia with machine learning using electronic health records

10.2337/figshare.12091953.v1 ◽

2020 ◽

Author(s):

Yue Ruan ◽

Alexis Bellot ◽

Zuzana Moysova ◽

Garry D. Tan ◽

Alistair Lumb ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Operating Characteristics ◽

Oxygen Saturation Level ◽

Electronic Health ◽

Clinically Significant

Objective We analyzed data from inpatients with diabetes admitted to a large university hospital to predict the risk of hypoglycemia through the use of machine learning algorithms. Research Design and Methods Four years of data was extracted from a hospital electronic health record system. This included laboratory and point-of-care blood glucose (BG) values to identify biochemical and clinically significant hypoglycaemic episodes (BG < 3.9 and < 2.9mmol/L respectively). We used patient demographics, administered medications, vital signs, laboratory results and procedures performed during the hospital stays to inform the model. Two iterations of the dataset included the doses of insulin administered and the past history of inpatient hypoglycaemia. Eighteen different prediction models were compared using the area under curve of the receiver operating characteristics (AUC_ROC) through a ten-fold cross validation. Results We analyzed data obtained from 17,658 inpatients with diabetes who underwent 32,758 admissions between July 2014 and August 2018. The predictive factors from the logistic regression model included people undergoing procedures, weight, type of diabetes, oxygen saturation level, use of medications (insulin, sulfonylurea, metformin) and albumin levels. The machine learning model with the best performance was the XGBoost model (AUC_ROC 0.96. This outperformed the logistic regression model which had an AUC_ROC of 0.75 for the estimation of the risk of clinically significant hypoglycaemia. Conclusions Advanced machine learning models are superior to logistic regression models in predicting the risk of hypoglycemia in inpatients with diabetes. Trials of such models should be conducted in real time to evaluate their utility to reduce inpatient hypoglycaemia.

Download Full-text

Machine learning to predict mortality after rehabilitation among patients with severe stroke

Scientific Reports ◽

10.1038/s41598-020-77243-3 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Domenico Scrutinio ◽

Carlo Ricciardi ◽

Leandro Donisi ◽

Ernesto Losavio ◽

Petronilla Battista ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Model ◽

Predictive Value ◽

Logistic Regression Model ◽

Clinical Decision Making ◽

Clinical Decision ◽

Severe Disability ◽

Operating Characteristics ◽

Receiver Operating Characteristics Curve

AbstractStroke is among the leading causes of death and disability worldwide. Approximately 20–25% of stroke survivors present severe disability, which is associated with increased mortality risk. Prognostication is inherent in the process of clinical decision-making. Machine learning (ML) methods have gained increasing popularity in the setting of biomedical research. The aim of this study was twofold: assessing the performance of ML tree-based algorithms for predicting three-year mortality model in 1207 stroke patients with severe disability who completed rehabilitation and comparing the performance of ML algorithms to that of a standard logistic regression. The logistic regression model achieved an area under the Receiver Operating Characteristics curve (AUC) of 0.745 and was well calibrated. At the optimal risk threshold, the model had an accuracy of 75.7%, a positive predictive value (PPV) of 33.9%, and a negative predictive value (NPV) of 91.0%. The ML algorithm outperformed the logistic regression model through the implementation of synthetic minority oversampling technique and the Random Forests, achieving an AUC of 0.928 and an accuracy of 86.3%. The PPV was 84.6% and the NPV 87.5%. This study introduced a step forward in the creation of standardisable tools for predicting health outcomes in individuals affected by stroke.

Download Full-text

Using Machine Learning Algorithm to Describe the Connection between the Types and Characteristics of Music Signal

Complexity ◽

10.1155/2021/5577486 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Bo Sun

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Classification Model ◽

Machine Learning Algorithm ◽

Music Classification ◽

Music Signal

Music classification is conducive to online music retrieval, but the current music classification model finds it difficult to accurately identify various types of music, which makes the classification effect of the current music classification model poor. In order to improve the accuracy of music classification, a music classification model based on multifeature fusion and machine learning algorithm is proposed. First, we obtain the music signal, and then extract various features from the classification of the music signal, and use machine learning algorithms to describe the type of music signal and the relationship between the features. The music classifier and deep belief network machine learning models in shallow logistic regression are established, respectively. Experiments were designed for these two models to verify the applicability of the model for music classification. By comparing the experimental results, it is found that the classification accuracy of the deep confidence network model is higher than that of the logistic regression model, but the number of iterations needed for its accuracy to converge is also higher than that of the logistic regression model. Compared with other current music classification models, this model reduces the time of constructing music classifier, speeds up the speed of music classification, and can identify various types of music with high precision. The accuracy of music classification is obviously improved, which verifies the superiority of this music classification model.

Download Full-text

Electrocardiogram-based mortality prediction in patients with COVID-19 using machine learning

EP Europace ◽

10.1093/europace/euab116.512 ◽

2021 ◽

Vol 23 (Supplement_3) ◽

Author(s):

H Bleijendaal ◽

RR Van Der Leur ◽

K Taha ◽

T Mast ◽

JMIH Gho ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

The Netherlands ◽

Hospital Mortality ◽

Regression Model ◽

Logistic Regression Model ◽

Prediction Models ◽

External Validation ◽

Mortality Prediction ◽

Receiver Operating Curve

Abstract Funding Acknowledgements Type of funding sources: Public hospital(s). Main funding source(s): The Netherlands Organisation for Health Research and Development (ZonMw) University of Amsterdam Research Priority Area Medical Integromics OnBehalf CAPACITY-COVID19 Registry Background The electrocardiogram (ECG) is an easy to assess, widely available and inexpensive tool that is frequently used during the work-up of hospitalized COVID-19 patients. So far, no study has been conducted to evaluate if ECG-based machine learning models are able to predict all-cause in-hospital mortality in COVID-19 patients. Purpose With this study, we aim to evaluate the value of using the ECG to predict in-hospital all-cause mortality of COVID-19 patients by analyzing the ECG at hospital admission, comparing a logistic regression based approach and a DNN based approach. Secondly, we aim to identify specific ECG features associated with mortality in patients diagnosed with COVID-19. Methods and results We studied 882 patients admitted with COVID-19 across seven hospitals in the Netherlands. Raw-format 12-lead ECGs recorded after admission (<72 hours) were collected, manually assessed, and annotated using pre-defined ECG features. Using data from five out of seven centers (n = 634), two mortality prediction models were developed: (a) a logistic regression model using manually annotated ECG features, and (b) a pre-trained deep neural network (DNN) using the raw ECG waveforms. Data from two other centers (n = 248) were used for external validation. Performance of both prediction models was similar, with a mean area under the receiver operating curve of 0.69 [95%CI 0.55–0.82] for the logistic regression model and 0.71 [95%CI 0.59–0.81] for the DNN in the external validation cohort. After adjustment for age and sex, ventricular rate (OR 1.13 [95% CI 1.01–1.27] per 10 ms increase), right bundle branch block (3.26 [95% CI 1.15–9.50]), ST-depression (2.78 [95% CI 1.03–7.70]) and low QRS voltages (3.09 [95% CI 1.02-9.38]) remained as significant predictors for mortality. Conclusion This study shows that ECG-based prediction models at admission may be a valuable addition to the initial risk stratification in admitted COVID-19 patients. The DNN model showed similar performance to the logistic regression that needs time-consuming manual annotation. Several ECG features associated with mortality were identified. Figure 1: Overview of methods, using and example case: (left) logistic regression and (right) deep learning. This specific case had a high probability of in-hospital mortality (above the threshold of 30%). Follow-up of this case showed that the patient had died during admission. Abstract Figure. Overview of ML methods used

Download Full-text

A comprehensive, contemporary assessment of the association between hepatosteatosis and coronary artery calcium scoring

European Heart Journal - Cardiovascular Imaging ◽

10.1093/ehjci/jeaa356.234 ◽

2021 ◽

Vol 22 (Supplement_1) ◽

Author(s):

T Heseltine ◽

SW Murray ◽

RL Jones ◽

M Fisher ◽

B Ruzsics

Keyword(s):

Risk Factors ◽

Logistic Regression ◽

Coronary Artery ◽

Regression Model ◽

Coronary Artery Calcium ◽

Logistic Regression Model ◽

Prediction Models ◽

Multinomial Logistic Regression ◽

Cvd Risk ◽

Male Sex

Abstract Funding Acknowledgements Type of funding sources: None. onbehalf Liverpool Multiparametric Imaging Collaboration Background Coronary artery calcium (CAC) score is a well-established technique for stratifying an individual’s cardiovascular disease (CVD) risk. Several well-established registries have incorporated CAC scoring into CVD risk prediction models to enhance accuracy. Hepatosteatosis (HS) has been shown to be an independent predictor of CVD events and can be measured on non-contrast computed tomography (CT). We sought to undertake a contemporary, comprehensive assessment of the influence of HS on CAC score alongside traditional CVD risk factors. In patients with HS it may be beneficial to offer routine CAC screening to evaluate CVD risk to enhance opportunities for earlier primary prevention strategies. Methods We performed a retrospective, observational analysis at a high-volume cardiac CT centre analysing consecutive CT coronary angiography (CTCA) studies. All patients referred for investigation of chest pain over a 28-month period (June 2014 to November 2016) were included. Patients with established CVD were excluded. The cardiac findings were reported by a cardiologist and retrospectively analysed by two independent radiologists for the presence of HS. Those with CAC of zero and those with CAC greater than zero were compared for demographic and cardiac risks. A multivariate analysis comparing the risk factors was performed to adjust for the presence of established risk factors. A binomial logistic regression model was developed to assess the association between the presence of HS and increasing strata of CAC. Results In total there were 1499 patients referred for CTCA without prior evidence of CVD. The assessment of HS was completed in 1195 (79.7%) and CAC score was performed in 1103 (92.3%). There were 466 with CVD and 637 without CVD. The prevalence of HS was significantly higher in those with CVD versus those without CVD on CTCA (51.3% versus 39.9%, p = 0.007). Male sex (50.7% versus 36.1% p= <0.001), age (59.4 ± 13.7 versus 48.1 ± 13.6, p= <0.001) and diabetes (12.4% versus 6.9%, p = 0.04) were also significantly higher in the CAC group compared to the CAC score of zero. HS was associated with increasing strata of CAC score compared with CAC of zero (CAC score 1-100 OR1.47, p = 0.01, CAC score 101-400 OR:1.68, p = 0.02, CAC score >400 OR 1.42, p = 0.14). This association became non-significant in the highest strata of CAC score. Conclusion We found a significant association between the increasing age, male sex, diabetes and HS with the presence of CAC. HS was also associated with a more severe phenotype of CVD based on the multinomial logistic regression model. Although the association reduced for the highest strata of CAC (CAC score >400) this likely reflects the overall low numbers of patients within this group and is likely a type II error. Based on these findings it may be appropriate to offer routine CVD risk stratification techniques in all those diagnosed with HS.

Download Full-text

Logistic Regression Model for Loan Prediction: A Machine Learning Approach

10.1109/eti4.051663.2021.9619201 ◽

2021 ◽

Author(s):

Richa Manglani ◽

Anuja Bokhare

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Learning Approach ◽

Machine Learning Approach

Download Full-text

Predicting hospitalization following psychiatric crisis care using machine learning

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01361-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Models ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Ensemble Model ◽

K Nearest Neighbors ◽

Crisis Care

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.

Download Full-text

Machine-Learning vs. Expert-Opinion Driven Logistic Regression Modelling for Predicting 30-Day Unplanned Rehospitalisation in Preterm Babies: A Prospective, Population-Based Study (EPIPAGE 2)

Frontiers in Pediatrics ◽

10.3389/fped.2020.585868 ◽

2021 ◽

Vol 8 ◽

Author(s):

Robert A. Reed ◽

Andrei S. Morgan ◽

Jennifer Zeitlin ◽

Pierre-Henri Jarreau ◽

Héloïse Torchin ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Regression Model ◽

Expert Opinion ◽

Logistic Regression Model ◽

Population Based ◽

Regression Modelling ◽

Preterm Babies ◽

Logistic Regression Modelling

Introduction: Preterm babies are a vulnerable population that experience significant short and long-term morbidity. Rehospitalisations constitute an important, potentially modifiable adverse event in this population. Improving the ability of clinicians to identify those patients at the greatest risk of rehospitalisation has the potential to improve outcomes and reduce costs. Machine-learning algorithms can provide potentially advantageous methods of prediction compared to conventional approaches like logistic regression.Objective: To compare two machine-learning methods (least absolute shrinkage and selection operator (LASSO) and random forest) to expert-opinion driven logistic regression modelling for predicting unplanned rehospitalisation within 30 days in a large French cohort of preterm babies.Design, Setting and Participants: This study used data derived exclusively from the population-based prospective cohort study of French preterm babies, EPIPAGE 2. Only those babies discharged home alive and whose parents completed the 1-year survey were eligible for inclusion in our study. All predictive models used a binary outcome, denoting a baby's status for an unplanned rehospitalisation within 30 days of discharge. Predictors included those quantifying clinical, treatment, maternal and socio-demographic factors. The predictive abilities of models constructed using LASSO and random forest algorithms were compared with a traditional logistic regression model. The logistic regression model comprised 10 predictors, selected by expert clinicians, while the LASSO and random forest included 75 predictors. Performance measures were derived using 10-fold cross-validation. Performance was quantified using area under the receiver operator characteristic curve, sensitivity, specificity, Tjur's coefficient of determination and calibration measures.Results: The rate of 30-day unplanned rehospitalisation in the eligible population used to construct the models was 9.1% (95% CI 8.2–10.1) (350/3,841). The random forest model demonstrated both an improved AUROC (0.65; 95% CI 0.59–0.7; p = 0.03) and specificity vs. logistic regression (AUROC 0.57; 95% CI 0.51–0.62, p = 0.04). The LASSO performed similarly (AUROC 0.59; 95% CI 0.53–0.65; p = 0.68) to logistic regression.Conclusions: Compared to an expert-specified logistic regression model, random forest offered improved prediction of 30-day unplanned rehospitalisation in preterm babies. However, all models offered relatively low levels of predictive ability, regardless of modelling method.

Download Full-text

Prediction of perinatal death using machine learning models: a birth registry-based cohort study in northern Tanzania

BMJ Open ◽

10.1136/bmjopen-2020-040132 ◽

2020 ◽

Vol 10 (10) ◽

pp. e040132

Author(s):

Innocent B Mboya ◽

Michael J Mahande ◽

Mohanad Mohammed ◽

Joseph Obure ◽

Henry G Mwambi

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Perinatal Death ◽

Learning Models ◽

Net Benefit ◽

Birth Registry ◽

Perinatal Deaths ◽

Machine Learning Models

ObjectiveWe aimed to determine the key predictors of perinatal deaths using machine learning models compared with the logistic regression model.DesignA secondary data analysis using the Kilimanjaro Christian Medical Centre (KCMC) Medical Birth Registry cohort from 2000 to 2015. We assessed the discriminative ability of models using the area under the receiver operating characteristics curve (AUC) and the net benefit using decision curve analysis.SettingThe KCMC is a zonal referral hospital located in Moshi Municipality, Kilimanjaro region, Northern Tanzania. The Medical Birth Registry is within the hospital grounds at the Reproductive and Child Health Centre.ParticipantsSingleton deliveries (n=42 319) with complete records from 2000 to 2015.Primary outcome measuresPerinatal death (composite of stillbirths and early neonatal deaths). These outcomes were only captured before mothers were discharged from the hospital.ResultsThe proportion of perinatal deaths was 3.7%. There were no statistically significant differences in the predictive performance of four machine learning models except for bagging, which had a significantly lower performance (AUC 0.76, 95% CI 0.74 to 0.79, p=0.006) compared with the logistic regression model (AUC 0.78, 95% CI 0.76 to 0.81). However, in the decision curve analysis, the machine learning models had a higher net benefit (ie, the correct classification of perinatal deaths considering a trade-off between false-negatives and false-positives)—over the logistic regression model across a range of threshold probability values.ConclusionsIn this cohort, there was no significant difference in the prediction of perinatal deaths between machine learning and logistic regression models, except for bagging. The machine learning models had a higher net benefit, as its predictive ability of perinatal death was considerably superior over the logistic regression model. The machine learning models, as demonstrated by our study, can be used to improve the prediction of perinatal deaths and triage for women at risk.

Download Full-text

P5710Clinical applications of machine learning for prediction of incident atrial fibrillation from the general population: a nationwide cohort study

European Heart Journal ◽

10.1093/eurheartj/ehz746.0651 ◽

2019 ◽

Vol 40 (Supplement_1) ◽

Author(s):

I.-S Kim ◽

P S Yang ◽

H T Yu ◽

T H Kim ◽

J S Uhm ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Logistic Regression ◽

General Population ◽

Regression Model ◽

National Health ◽

Logistic Regression Model ◽

Learning System ◽

Health Examination ◽

Clinical Variables

Abstract Background To evaluate the ability of machine learning algorithms to predict incident atrial fibrillation (AF) from the general population using health examination items. Methods We included 483,343 subjects who received national health examinations from the Korean National Health Insurance Service-based National Sample Cohort (NHIS-NSC). We trained deep neural network model (DNN) of a deep learning system and decision tree model (DT) of a machine learning system using clinical variables and health examination items (including age, sex, body mass index, history of heart failure, hypertension or diabetes, baseline creatinine, and smoking and alcohol intake habits) to predict incident AF using a training dataset of 341,771 subjects constructed from the NHIS-NSC database. The DNN and DT were validated using an independent test dataset of 141,572 remaining subjects. C-indices of DNN and DT for prediction of incident AF were compared with that of conventional logistic regression model. Results During 1,874,789 person·years (mean±standard-deviation age 47.7±14.4 years, 49.6% male), 3,282 subjects with incident AF were observed. In the validation dataset, 1,139 subjects with incident AF were observed. The c-indices of the DNN and DT for incident AF prediction were 0.828 [0.819–0.836] and 0.835 [0.825–0.844], and were significantly higher (p<0.01) than conventional logistic regression model (c-index=0.789 [0.784–0.794]). Conclusions Application of machine learning using simple clinical variables and health examination items was helpful to predict incident AF in the general population. Prospective study is warranted to construct an individualized precision medicine.

Download Full-text