Prediction of Hemorrhagic Transformation after Ischemic Stroke Using Machine Learning

Jeong-Myeong Choi; Soo-Young Seo; Pum-Jun Kim; Yu-Seop Kim; Sang-Hwa Lee; Jong-Hee Sohn; Dong-Kyu Kim; Jae-Jun Lee; Chulho Kim

doi:10.3390/jpm11090863

Prediction of Hemorrhagic Transformation after Ischemic Stroke Using Machine Learning

Journal of Personalized Medicine ◽

10.3390/jpm11090863 ◽

2021 ◽

Vol 11 (9) ◽

pp. 863

Author(s):

Jeong-Myeong Choi ◽

Soo-Young Seo ◽

Pum-Jun Kim ◽

Yu-Seop Kim ◽

Sang-Hwa Lee ◽

...

Keyword(s):

Machine Learning ◽

Ischemic Stroke ◽

Binary Logistic Regression ◽

Hemorrhagic Transformation ◽

Structured Data ◽

Gradient Boosting ◽

Support Vector ◽

Test Dataset ◽

Extreme Gradient Boosting ◽

Artificial Neural Network Ann

Hemorrhagic transformation (HT) is one of the leading causes of a poor prognostic marker after acute ischemic stroke (AIS). We compared the performances of the several machine learning (ML) algorithms to predict HT after AIS using only structured data. A total of 2028 patients with AIS, who were admitted within seven days of symptoms onset, were included in this analysis. HT was defined based on the criteria of the European Co-operative Acute Stroke Study-II trial. The whole dataset was randomly divided into a training and a test dataset with a 7:3 ratio. Binary logistic regression, support vector machine, extreme gradient boosting, and artificial neural network (ANN) algorithms were used to assess the performance of predicting the HT occurrence after AIS. Five-fold cross validation and a grid search technique were used to optimize the hyperparameters of each ML model, which had its performance measured by the area under the receiver operating characteristic (AUROC) curve. Among the included AIS patients, the mean age and number of male subjects were 69.6 years and 1183 (58.3%), respectively. HT was observed in 318 subjects (15.7%). There were no significant differences in corresponding variables between the training and test dataset. Among all the ML algorithms, the ANN algorithm showed the best performance in terms of predicting the occurrence of HT in our dataset (0.844). Feature scaling including standardization and normalization, and the resampling strategy showed no additional improvement of the ANN’s performance. The ANN-based prediction of HT after AIS showed better performance than the conventional ML algorithms. Deep learning may be used to predict important outcomes for structured data-based prediction.

Download Full-text

Machine Learning-Based Three-Month Outcome Prediction in Acute Ischemic Stroke: A Single Cerebrovascular-Specialty Hospital Study in South Korea

Diagnostics ◽

10.3390/diagnostics11101909 ◽

2021 ◽

Vol 11 (10) ◽

pp. 1909

Author(s):

Dougho Park ◽

Eunhwan Jeong ◽

Haejong Kim ◽

Hae Wook Pyun ◽

Haemin Kim ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Ischemic Stroke ◽

Acute Ischemic Stroke ◽

Functional Outcome ◽

Outcome Prediction ◽

Prediction Models ◽

Gradient Boosting ◽

Support Vector ◽

Extreme Gradient Boosting

Background: Functional outcomes after acute ischemic stroke are of great concern to patients and their families, as well as physicians and surgeons who make the clinical decisions. We developed machine learning (ML)-based functional outcome prediction models in acute ischemic stroke. Methods: This retrospective study used a prospective cohort database. A total of 1066 patients with acute ischemic stroke between January 2019 and March 2021 were included. Variables such as demographic factors, stroke-related factors, laboratory findings, and comorbidities were utilized at the time of admission. Five ML algorithms were applied to predict a favorable functional outcome (modified Rankin Scale 0 or 1) at 3 months after stroke onset. Results: Regularized logistic regression showed the best performance with an area under the receiver operating characteristic curve (AUC) of 0.86. Support vector machines represented the second-highest AUC of 0.85 with the highest F1-score of 0.86, and finally, all ML models applied achieved an AUC > 0.8. The National Institute of Health Stroke Scale at admission and age were consistently the top two important variables for generalized logistic regression, random forest, and extreme gradient boosting models. Conclusions: ML-based functional outcome prediction models for acute ischemic stroke were validated and proven to be readily applicable and useful.

Download Full-text

Abstract P308: Machine Learning Based Models of the 30-Day Readmission for Stroke Patients Using Electronic Health Record Data

Stroke ◽

10.1161/str.52.suppl_1.p308 ◽

2021 ◽

Vol 52 (Suppl_1) ◽

Author(s):

Negar Darabi ◽

Niyousha Hosseinichimeh ◽

Anthony Noto ◽

Ramin Zand ◽

Vida Abedi

Keyword(s):

Machine Learning ◽

Ischemic Stroke ◽

Care Delivery ◽

Gradient Boosting ◽

Support Vector ◽

Stroke Patients ◽

Electronic Health Record Data ◽

Percutaneous Gastrostomy ◽

Extreme Gradient Boosting ◽

Electronic Health

Background: At a personalized level, identification of patients at higher risk of 30-day readmission and in need of special clinical attention could lower their chances of readmission. While at a system’s level, reducing hospital readmission improves the overall quality of care delivery and reduces the associated cost burden. Objective: To enhance understanding of the predictors of 30-day readmission after ischemic stroke and identify high-risk individuals. We aimed to compare the performance and the predictive power of machine learning-based methods and identify the best model. Method: The electronic health records (EHR) of acute ischemic stroke patients were extracted from two tertiary centers within the Geisinger Health System between January 1, 2015, and October 7, 2018. A total of 61 variables, including clinical variables, demographical characteristics, discharge status, and type of health insurance were used in this study. Patients were randomly split for model development (80%) and testing (20%). Random forest, gradient boosting machine, extreme gradient boosting (XGBoost), support vector machine, and logistic regression, were developed to predict the 30-day readmission after stroke. The models were evaluated based on the area under the curve (AUC), sensitivity, specificity, and positive predictive value (PPV). Results: A total of 3,184 patients with ischemic stroke (mean age: 71±13.90 years, men: 51.06%) were included in this study. From the 3,184, 301 (9.40%) were readmitted within 30-day. The best performance was obtained when XGBoost was used with ROSE-sampling. The AUC for the test set was 0.74 (95% CI: 0.64-0.78) with PPV of 0.43. The top four predictors of the 30-day readmission model were National Institutes of Health Stroke Scale score above 24, insert an indwelling urinary catheter, hypercoagulable state, and percutaneous gastrostomy. Conclusions: Machine learning model can be designed to predict 30-day readmission after stroke using structured data from EHR. Among the five algorithms analyzed, XGBoost had the best performance.

Download Full-text

EXPRESS: Identifying mislabelled samples: machine learning models exceed human performance

Annals of Clinical Biochemistry International Journal of Laboratory Medicine ◽

10.1177/00045632211032991 ◽

2021 ◽

pp. 000456322110329

Author(s):

Christopher-John Lancaster Farrell

Keyword(s):

Machine Learning ◽

Human Performance ◽

Preliminary Investigation ◽

Gradient Boosting ◽

Support Vector ◽

Final Decision ◽

Laboratory Staff ◽

Clinical Laboratories ◽

Extreme Gradient Boosting ◽

Artificial Neural Network Ann

Background: It is difficult for clinical laboratories to identify samples that are labelled with the details of an incorrect patient. Many laboratories screen for these errors with delta checks, with final decision-making based on manual review of results by laboratory staff. Machine learning (ML) models have been shown to outperform delta checks for identifying these errors. However, a comparison of ML models to human-level performance has not yet been made. Methods: Deidentified data for current and previous (within seven days) electrolytes, urea and creatinine results was used in the computer simulation of mislabelled samples. Eight different ML models were developed on 127,256 sets of results using different algorithms: artificial neural network (ANN), extreme gradient boosting, support vector machine, random forest, logistic regression, k-nearest neighbours and two decision trees (one complex and one simple). A separate test dataset (n = 14,140) was used to evaluate the performance of these models as well as laboratory staff volunteers, who manually reviewed a random subset of this data (n = 500). Results: The best performing ML model was the ANN (92.1% accuracy), with the simple decision tree demonstrating the poorest accuracy (86.5%). The accuracy of laboratory staff for identifying mislabelled samples was 77.8%. Conclusions: The results of this preliminary investigation suggest that even relatively simple ML models can exceed human performance for identifying mislabelled samples. ML techniques should be considered for implementation in clinical laboratories to assist with error identification.

Download Full-text

Prediction of Long-Term Stroke Recurrence Using Machine Learning Models

Journal of Clinical Medicine ◽

10.3390/jcm10061286 ◽

2021 ◽

Vol 10 (6) ◽

pp. 1286

Author(s):

Vida Abedi ◽

Venkatesh Avula ◽

Durgesh Chaudhary ◽

Shima Shahjouei ◽

Ayesha Khan ◽

...

Keyword(s):

Machine Learning ◽

Ischemic Stroke ◽

Performance Metrics ◽

Gradient Boosting ◽

Stroke Recurrence ◽

Support Vector ◽

Sampling Strategies ◽

Specificity And Sensitivity ◽

Extreme Gradient Boosting

Background: The long-term risk of recurrent ischemic stroke, estimated to be between 17% and 30%, cannot be reliably assessed at an individual level. Our goal was to study whether machine-learning can be trained to predict stroke recurrence and identify key clinical variables and assess whether performance metrics can be optimized. Methods: We used patient-level data from electronic health records, six interpretable algorithms (Logistic Regression, Extreme Gradient Boosting, Gradient Boosting Machine, Random Forest, Support Vector Machine, Decision Tree), four feature selection strategies, five prediction windows, and two sampling strategies to develop 288 models for up to 5-year stroke recurrence prediction. We further identified important clinical features and different optimization strategies. Results: We included 2091 ischemic stroke patients. Model area under the receiver operating characteristic (AUROC) curve was stable for prediction windows of 1, 2, 3, 4, and 5 years, with the highest score for the 1-year (0.79) and the lowest score for the 5-year prediction window (0.69). A total of 21 (7%) models reached an AUROC above 0.73 while 110 (38%) models reached an AUROC greater than 0.7. Among the 53 features analyzed, age, body mass index, and laboratory-based features (such as high-density lipoprotein, hemoglobin A1c, and creatinine) had the highest overall importance scores. The balance between specificity and sensitivity improved through sampling strategies. Conclusion: All of the selected six algorithms could be trained to predict the long-term stroke recurrence and laboratory-based variables were highly associated with stroke recurrence. The latter could be targeted for personalized interventions. Model performance metrics could be optimized, and models can be implemented in the same healthcare system as intelligent decision support for targeted intervention.

Download Full-text

Machine Learning-Enabled 30-Day Readmission Model for Stroke Patients

Frontiers in Neurology ◽

10.3389/fneur.2021.638267 ◽

2021 ◽

Vol 12 ◽

Author(s):

Negar Darabi ◽

Niyousha Hosseinichimeh ◽

Anthony Noto ◽

Ramin Zand ◽

Vida Abedi

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Ischemic Stroke ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Percutaneous Gastrostomy ◽

Targeted Interventions ◽

Clinical Variables ◽

Extreme Gradient Boosting

Background and Purpose: Hospital readmissions impose a substantial burden on the healthcare system. Reducing readmissions after stroke could lead to improved quality of care especially since stroke is associated with a high rate of readmission. The goal of this study is to enhance our understanding of the predictors of 30-day readmission after ischemic stroke and develop models to identify high-risk individuals for targeted interventions.Methods: We used patient-level data from electronic health records (EHR), five machine learning algorithms (random forest, gradient boosting machine, extreme gradient boosting–XGBoost, support vector machine, and logistic regression-LR), data-driven feature selection strategy, and adaptive sampling to develop 15 models of 30-day readmission after ischemic stroke. We further identified important clinical variables.Results: We included 3,184 patients with ischemic stroke (mean age: 71 ± 13.90 years, men: 51.06%). Among the 61 clinical variables included in the model, the National Institutes of Health Stroke Scale score above 24, insert indwelling urinary catheter, hypercoagulable state, and percutaneous gastrostomy had the highest importance score. The Model's AUC (area under the curve) for predicting 30-day readmission was 0.74 (95%CI: 0.64–0.78) with PPV of 0.43 when the XGBoost algorithm was used with ROSE-sampling. The balance between specificity and sensitivity improved through the sampling strategy. The best sensitivity was achieved with LR when optimized with feature selection and ROSE-sampling (AUC: 0.64, sensitivity: 0.53, specificity: 0.69).Conclusions: Machine learning-based models can be designed to predict 30-day readmission after stroke using structured data from EHR. Among the algorithms analyzed, XGBoost with ROSE-sampling had the best performance in terms of AUC while LR with ROSE-sampling and feature selection had the best sensitivity. Clinical variables highly associated with 30-day readmission could be targeted for personalized interventions. Depending on healthcare systems' resources and criteria, models with optimized performance metrics can be implemented to improve outcomes.

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text

Prediction of Healing Performance of Autogenous Healing Concrete Using Machine Learning

Materials ◽

10.3390/ma14154068 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4068

Author(s):

Xu Huang ◽

Mirna Wasouf ◽

Jessada Sresakoolchai ◽

Sakdirat Kaewunruen

Keyword(s):

Machine Learning ◽

Search Algorithm ◽

Weather Conditions ◽

Prediction Performance ◽

Machine Learning Algorithms ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Support Vector ◽

Self Healing ◽

Artificial Neural Network Ann

Cracks typically develop in concrete due to shrinkage, loading actions, and weather conditions; and may occur anytime in its life span. Autogenous healing concrete is a type of self-healing concrete that can automatically heal cracks based on physical or chemical reactions in concrete matrix. It is imperative to investigate the healing performance that autogenous healing concrete possesses, to assess the extent of the cracking and to predict the extent of healing. In the research of self-healing concrete, testing the healing performance of concrete in a laboratory is costly, and a mass of instances may be needed to explore reliable concrete design. This study is thus the world’s first to establish six types of machine learning algorithms, which are capable of predicting the healing performance (HP) of self-healing concrete. These algorithms involve an artificial neural network (ANN), a k-nearest neighbours (kNN), a gradient boosting regression (GBR), a decision tree regression (DTR), a support vector regression (SVR) and a random forest (RF). Parameters of these algorithms are tuned utilising grid search algorithm (GSA) and genetic algorithm (GA). The prediction performance indicated by coefficient of determination (R2) and root mean square error (RMSE) measures of these algorithms are evaluated on the basis of 1417 data sets from the open literature. The results show that GSA-GBR performs higher prediction performance (R2GSA-GBR = 0.958) and stronger robustness (RMSEGSA-GBR = 0.202) than the other five types of algorithms employed to predict the healing performance of autogenous healing concrete. Therefore, reliable prediction accuracy of the healing performance and efficient assistance on the design of autogenous healing concrete can be achieved.

Download Full-text

Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival

Scientific Reports ◽

10.1038/s41598-021-86327-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Arturo Moncada-Torres ◽

Marissa C. van Maaren ◽

Mathijs P. Hendriks ◽

Sabine Siesling ◽

Gijs Geleijnse

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Explicit Knowledge ◽

Cox Regression ◽

Metastatic Breast ◽

Gradient Boosting ◽

Support Vector ◽

Netherlands Cancer Registry ◽

Extreme Gradient Boosting ◽

The Impact

AbstractCox Proportional Hazards (CPH) analysis is the standard for survival analysis in oncology. Recently, several machine learning (ML) techniques have been adapted for this task. Although they have shown to yield results at least as good as classical methods, they are often disregarded because of their lack of transparency and little to no explainability, which are key for their adoption in clinical settings. In this paper, we used data from the Netherlands Cancer Registry of 36,658 non-metastatic breast cancer patients to compare the performance of CPH with ML techniques (Random Survival Forests, Survival Support Vector Machines, and Extreme Gradient Boosting [XGB]) in predicting survival using the $$c$$ c -index. We demonstrated that in our dataset, ML-based models can perform at least as good as the classical CPH regression ($$c$$ c -index $$\sim \,0.63$$ ∼ 0.63 ), and in the case of XGB even better ($$c$$ c -index $$\sim 0.73$$ ∼ 0.73 ). Furthermore, we used Shapley Additive Explanation (SHAP) values to explain the models’ predictions. We concluded that the difference in performance can be attributed to XGB’s ability to model nonlinearities and complex interactions. We also investigated the impact of specific features on the models’ predictions as well as their corresponding insights. Lastly, we showed that explainable ML can generate explicit knowledge of how models make their predictions, which is crucial in increasing the trust and adoption of innovative ML techniques in oncology and healthcare overall.

Download Full-text

Exploring the Mechanism of Crashes with Autonomous Vehicles Using Machine Learning

Mathematical Problems in Engineering ◽

10.1155/2021/5524356 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Hengrui Chen ◽

Hong Chen ◽

Ruiyu Zhou ◽

Zhizhen Liu ◽

Xiaoke Sun

Keyword(s):

Machine Learning ◽

Autonomous Vehicles ◽

Classification And Regression Tree ◽

Gradient Boosting ◽

Support Vector ◽

Crash Severity ◽

Apriori Algorithm ◽

Driving Mode ◽

Extreme Gradient Boosting ◽

The Impact

The safety issue has become a critical obstacle that cannot be ignored in the marketization of autonomous vehicles (AVs). The objective of this study is to explore the mechanism of AV-involved crashes and analyze the impact of each feature on crash severity. We use the Apriori algorithm to explore the causal relationship between multiple factors to explore the mechanism of crashes. We use various machine learning models, including support vector machine (SVM), classification and regression tree (CART), and eXtreme Gradient Boosting (XGBoost), to analyze the crash severity. Besides, we apply the Shapley Additive Explanations (SHAP) to interpret the importance of each factor. The results indicate that XGBoost obtains the best result (recall = 75%; G-mean = 67.82%). Both XGBoost and Apriori algorithm effectively provided meaningful insights about AV-involved crash characteristics and their relationship. Among all these features, vehicle damage, weather conditions, accident location, and driving mode are the most critical features. We found that most rear-end crashes are conventional vehicles bumping into the rear of AVs. Drivers should be extremely cautious when driving in fog, snow, and insufficient light. Besides, drivers should be careful when driving near intersections, especially in the autonomous driving mode.

Download Full-text

Machine learning as a successful approach for predicting complex spatio–temporal patterns in animal species abundance

Animal Biodiversity and Conservation ◽

10.32800/abc.2021.44.0289 ◽

2021 ◽

pp. 289-301

Author(s):

B. Martín ◽

J. González–Arias ◽

J. A. Vicente–Vírseda

Keyword(s):

Machine Learning ◽

Random Forest ◽

Animal Species ◽

Temporal Patterns ◽

Additive Models ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Extreme Gradient Boosting ◽

Spatio Temporal

Our aim was to identify an optimal analytical approach for accurately predicting complex spatio–temporal patterns in animal species distribution. We compared the performance of eight modelling techniques (generalized additive models, regression trees, bagged CART, k–nearest neighbors, stochastic gradient boosting, support vector machines, neural network, and random forest –enhanced form of bootstrap. We also performed extreme gradient boosting –an enhanced form of radiant boosting– to predict spatial patterns in abundance of migrating Balearic shearwaters based on data gathered within eBird. Derived from open–source datasets, proxies of frontal systems and ocean productivity domains that have been previously used to characterize the oceanographic habitats of seabirds were quantified, and then used as predictors in the models. The random forest model showed the best performance according to the parameters assessed (RMSE value and R2). The correlation between observed and predicted abundance with this model was also considerably high. This study shows that the combination of machine learning techniques and massive data provided by open data sources is a useful approach for identifying the long–term spatial–temporal distribution of species at regional spatial scales.

Download Full-text