Machine Learning Prediction of Stroke Mechanism in Embolic Strokes of Undetermined Source

Hooman Kamel; Babak B. Navi; Neal S. Parikh; Alexander E. Merkler; Peter M. Okin; Richard B. Devereux; Jonathan W. Weinsaft; Jiwon Kim; Jim W. Cheung; Luke K. Kim; Barbara Casadei; Costantino Iadecola; Mert R. Sabuncu; Ajay Gupta; Iván Díaz

doi:10.1161/strokeaha.120.029305

Machine Learning Prediction of Stroke Mechanism in Embolic Strokes of Undetermined Source

Stroke ◽

10.1161/strokeaha.120.029305 ◽

2020 ◽

Vol 51 (9) ◽

Cited By ~ 1

Author(s):

Hooman Kamel ◽

Babak B. Navi ◽

Neal S. Parikh ◽

Alexander E. Merkler ◽

Peter M. Okin ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Cross Validation ◽

Learning Algorithm ◽

Random Search ◽

Model Performance ◽

Area Under The Curve ◽

Independent Set ◽

Predicted Probability ◽

Using Data

Background and Purpose: One-fifth of ischemic strokes are embolic strokes of undetermined source (ESUS). Their theoretical causes can be classified as cardioembolic versus noncardioembolic. This distinction has important implications, but the categories’ proportions are unknown. Methods: Using data from the Cornell Acute Stroke Academic Registry, we trained a machine-learning algorithm to distinguish cardioembolic versus non-cardioembolic strokes, then applied the algorithm to ESUS cases to determine the predicted proportion with an occult cardioembolic source. A panel of neurologists adjudicated stroke etiologies using standard criteria. We trained a machine learning classifier using data on demographics, comorbidities, vitals, laboratory results, and echocardiograms. An ensemble predictive method including L1 regularization, gradient-boosted decision tree ensemble (XGBoost), random forests, and multivariate adaptive splines was used. Random search and cross-validation were used to tune hyperparameters. Model performance was assessed using cross-validation among cases of known etiology. We applied the final algorithm to an independent set of ESUS cases to determine the predicted mechanism (cardioembolic or not). To assess our classifier’s validity, we correlated the predicted probability of a cardioembolic source with the eventual post-ESUS diagnosis of atrial fibrillation. Results: Among 1083 strokes with known etiologies, our classifier distinguished cardioembolic versus noncardioembolic cases with excellent accuracy (area under the curve, 0.85). Applied to 580 ESUS cases, the classifier predicted that 44% (95% credibility interval, 39%–49%) resulted from cardiac embolism. Individual ESUS patients’ predicted likelihood of cardiac embolism was associated with eventual atrial fibrillation detection (OR per 10% increase, 1.27 [95% CI, 1.03–1.57]; c-statistic, 0.68 [95% CI, 0.58–0.78]). ESUS patients with high predicted probability of cardiac embolism were older and had more coronary and peripheral vascular disease, lower ejection fractions, larger left atria, lower blood pressures, and higher creatinine levels. Conclusions: A machine learning estimator that distinguished known cardioembolic versus noncardioembolic strokes indirectly estimated that 44% of ESUS cases were cardioembolic.

Download Full-text

Conversion Uplift in E-Commerce: A Systematic Benchmark of Modeling Strategies

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622019500172 ◽

2019 ◽

Vol 18 (03) ◽

pp. 747-791 ◽

Cited By ~ 3

Author(s):

Robin Gubela ◽

Artem Bequé ◽

Stefan Lessmann ◽

Fabian Gebert

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Model Performance ◽

Predictive Performance ◽

Academic Disciplines ◽

Response Models ◽

Uplift Modeling ◽

Online Retailers ◽

Using Data ◽

Modeling Strategy

Uplift modeling combines machine learning and experimental strategies to estimate the differential effect of a treatment on individuals’ behavior. The paper considers uplift models in the scope of marketing campaign targeting. Literature on uplift modeling strategies is fragmented across academic disciplines and lacks an overarching empirical comparison. Using data from online retailers, we fill this gap and contribute to literature through consolidating prior work on uplift modeling and systematically comparing the predictive performance and utility of available uplift modeling strategies. Our empirical study includes three experiments in which we examine the interaction between an uplift modeling strategy and the underlying machine learning algorithm to implement the strategy, quantify model performance in terms of business value and demonstrate the advantages of uplift models over response models, which are widely used in marketing. The results facilitate making specific recommendations how to deploy uplift models in e-commerce applications.

Download Full-text

Predicting cerebral infarction in patients with atrial fibrillation using machine learning: The Fushimi AF registry

Journal of Cerebral Blood Flow & Metabolism ◽

10.1177/0271678x211063802 ◽

2021 ◽

pp. 0271678X2110638

Author(s):

Hidehisa Nishi ◽

Naoya Oishi ◽

Hisashi Ogawa ◽

Kishida Natsue ◽

Kento Doi ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Cerebral Infarction ◽

Validation Cohort ◽

Learning Algorithm ◽

Anticoagulation Therapy ◽

Area Under The Curve ◽

Discrimination Performance ◽

Gradient Boosting ◽

Derivation Cohort

The CHADS2 and CHA2DS2-VASc scores are widely used to assess ischemic risk in the patients with atrial fibrillation (AF). However, the discrimination performance of these scores is limited. Using the data from a community-based prospective cohort study, we sought to construct a machine learning-based prediction model for cerebral infarction in patients with AF, and to compare its performance with the existing scores. All consecutive patients with AF treated at 81 study institutions from March 2011 to May 2017 were enrolled (n = 4396). The whole dataset was divided into a derivation cohort (n = 1005) and validation cohort (n = 752) after excluding the patients with valvular AF and anticoagulation therapy. Using the derivation cohort dataset, a machine learning model based on gradient boosting tree algorithm (ML) was built to predict cerebral infarction. In the validation cohort, the receiver operating characteristic area under the curve of the ML model was higher than those of the existing models according to the Hanley and McNeil method: ML, 0.72 (95%CI, 0.66–0.79); CHADS2, 0.61 (95%CI, 0.53–0.69); CHA2DS2-VASc, 0.62 (95%CI, 0.54–0.70). As a conclusion, machine learning algorithm have the potential to perform better than the CHADS2 and CHA2DS2-VASc scores for predicting cerebral infarction in patients with non-valvular AF.

Download Full-text

Prediction of K562 Cells Functional Inhibitors Based on Machine Learning Approaches

Current Pharmaceutical Design ◽

10.2174/1381612825666191107092214 ◽

2020 ◽

Vol 25 (40) ◽

pp. 4296-4302 ◽

Cited By ~ 2

Author(s):

Yuan Zhang ◽

Zhenyan Han ◽

Qian Gao ◽

Xiaoyi Bai ◽

Chi Zhang ◽

...

Keyword(s):

Machine Learning ◽

Inclusion Bodies ◽

Cross Validation ◽

Independent Set ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Validation Test ◽

Excess Number ◽

Fold Cross Validation

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.

Download Full-text

Early Prediction of Seven-Day Mortality in Intensive Care Unit Using a Machine Learning Model: Results from the SPIN-UTI Project

Journal of Clinical Medicine ◽

10.3390/jcm10050992 ◽

2021 ◽

Vol 10 (5) ◽

pp. 992

Author(s):

Martina Barchitta ◽

Andrea Maugeri ◽

Giuliana Favara ◽

Paolo Marco Riela ◽

Giovanni Gallo ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care ◽

Intensive Care Units ◽

Learning Algorithm ◽

Area Under The Curve ◽

Support Vector ◽

Icu Admission ◽

Risk Of Death ◽

Saps Ii ◽

Svm Algorithm

Patients in intensive care units (ICUs) were at higher risk of worsen prognosis and mortality. Here, we aimed to evaluate the ability of the Simplified Acute Physiology Score (SAPS II) to predict the risk of 7-day mortality, and to test a machine learning algorithm which combines the SAPS II with additional patients’ characteristics at ICU admission. We used data from the “Italian Nosocomial Infections Surveillance in Intensive Care Units” network. Support Vector Machines (SVM) algorithm was used to classify 3782 patients according to sex, patient’s origin, type of ICU admission, non-surgical treatment for acute coronary disease, surgical intervention, SAPS II, presence of invasive devices, trauma, impaired immunity, antibiotic therapy and onset of HAI. The accuracy of SAPS II for predicting patients who died from those who did not was 69.3%, with an Area Under the Curve (AUC) of 0.678. Using the SVM algorithm, instead, we achieved an accuracy of 83.5% and AUC of 0.896. Notably, SAPS II was the variable that weighted more on the model and its removal resulted in an AUC of 0.653 and an accuracy of 68.4%. Overall, these findings suggest the present SVM model as a useful tool to early predict patients at higher risk of death at ICU admission.

Download Full-text

Assisting scalable diagnosis automatically via CT images in the combat against COVID-19

Scientific Reports ◽

10.1038/s41598-021-83424-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Bohan Liu ◽

Pan Liu ◽

Lutao Dai ◽

Yanlin Yang ◽

Peng Xie ◽

...

Keyword(s):

Large Scale ◽

Area Under The Curve ◽

Independent Set ◽

Reference Method ◽

Ct Images ◽

Loss Of Life ◽

Case Identification ◽

Reverse Transcription Pcr ◽

Using Data

AbstractThe pandemic of Coronavirus Disease 2019 (COVID-19) is causing enormous loss of life globally. Prompt case identification is critical. The reference method is the real-time reverse transcription PCR (RT-PCR) assay, whose limitations may curb its prompt large-scale application. COVID-19 manifests with chest computed tomography (CT) abnormalities, some even before the onset of symptoms. We tested the hypothesis that the application of deep learning (DL) to 3D CT images could help identify COVID-19 infections. Using data from 920 COVID-19 and 1,073 non-COVID-19 pneumonia patients, we developed a modified DenseNet-264 model, COVIDNet, to classify CT images to either class. When tested on an independent set of 233 COVID-19 and 289 non-COVID-19 pneumonia patients, COVIDNet achieved an accuracy rate of 94.3% and an area under the curve of 0.98. As of March 23, 2020, the COVIDNet system had been used 11,966 times with a sensitivity of 91.12% and a specificity of 88.50% in six hospitals with PCR confirmation. Application of DL to CT images may improve both efficiency and capacity of case detection and long-term surveillance.

Download Full-text

Glioblastoma and primary central nervous system lymphoma: differentiation using MRI derived first-order texture analysis – a machine learning study

The Neuroradiology Journal ◽

10.1177/1971400921998979 ◽

2021 ◽

pp. 197140092199897

Author(s):

Sarv Priya ◽

Caitlin Ward ◽

Thomas Locke ◽

Neetu Soni ◽

Ravishankar Pillenahalli Maheshwarappa ◽

...

Keyword(s):

Machine Learning ◽

Central Nervous System ◽

Nervous System ◽

Diagnostic Performance ◽

Model Performance ◽

Area Under The Curve ◽

Central Nervous System Lymphoma ◽

First Order ◽

Single Slice ◽

Contrast Enhanced

Objectives To evaluate the diagnostic performance of multiple machine learning classifier models derived from first-order histogram texture parameters extracted from T1-weighted contrast-enhanced images in differentiating glioblastoma and primary central nervous system lymphoma. Methods Retrospective study with 97 glioblastoma and 46 primary central nervous system lymphoma patients. Thirty-six different combinations of classifier models and feature selection techniques were evaluated. Five-fold nested cross-validation was performed. Model performance was assessed for whole tumour and largest single slice using receiver operating characteristic curve. Results The cross-validated model performance was relatively similar for the top performing models for both whole tumour and largest single slice (area under the curve 0.909–0.924). However, there was a considerable difference between the worst performing model (logistic regression with full feature set, area under the curve 0.737) and the highest performing model for whole tumour (least absolute shrinkage and selection operator model with correlation filter, area under the curve 0.924). For single slice, the multilayer perceptron model with correlation filter had the highest performance (area under the curve 0.914). No significant difference was seen between the diagnostic performance of the top performing model for both whole tumour and largest single slice. Conclusions T1 contrast-enhanced derived first-order texture analysis can differentiate between glioblastoma and primary central nervous system lymphoma with good diagnostic performance. The machine learning performance can vary significantly depending on the model and feature selection methods. Largest single slice and whole tumour analysis show comparable diagnostic performance.

Download Full-text

Clinical Score and Machine Learning-Based Model to Predict Diagnosis of Primary Aldosteronism in Arterial Hypertension

Hypertension ◽

10.1161/hypertensionaha.121.17444 ◽

2021 ◽

Vol 78 (5) ◽

pp. 1595-1604

Author(s):

Fabrizio Buffolo ◽

Jacopo Burrello ◽

Alessio Burrello ◽

Daniel Heinrich ◽

Christian Adolf ◽

...

Keyword(s):

Machine Learning ◽

Arterial Hypertension ◽

Primary Aldosteronism ◽

Learning Algorithm ◽

Area Under The Curve ◽

Clinical Score ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Individual Risk ◽

The Individual

Primary aldosteronism (PA) is the cause of arterial hypertension in 4% to 6% of patients, and 30% of patients with PA are affected by unilateral and surgically curable forms. Current guidelines recommend screening for PA ≈50% of patients with hypertension on the basis of individual factors, while some experts suggest screening all patients with hypertension. To define the risk of PA and tailor the diagnostic workup to the individual risk of each patient, we developed a conventional scoring system and supervised machine learning algorithms using a retrospective cohort of 4059 patients with hypertension. On the basis of 6 widely available parameters, we developed a numerical score and 308 machine learning-based models, selecting the one with the highest diagnostic performance. After validation, we obtained high predictive performance with our score (optimized sensitivity of 90.7% for PA and 92.3% for unilateral PA [UPA]). The machine learning-based model provided the highest performance, with an area under the curve of 0.834 for PA and 0.905 for diagnosis of UPA, with optimized sensitivity of 96.6% for PA, and 100.0% for UPA, at validation. The application of the predicting tools allowed the identification of a subgroup of patients with very low risk of PA (0.6% for both models) and null probability of having UPA. In conclusion, this score and the machine learning algorithm can accurately predict the individual pretest probability of PA in patients with hypertension and circumvent screening in up to 32.7% of patients using a machine learning-based model, without omitting patients with surgically curable UPA.

Download Full-text

Multi-Step-Ahead Forecasting of Wave Conditions Based on a Physics-Based Machine Learning (PBML) Model for Marine Operations

Journal of Marine Science and Engineering ◽

10.3390/jmse8120992 ◽

2020 ◽

Vol 8 (12) ◽

pp. 992

Author(s):

Mengning Wu ◽

Christos Stefanakos ◽

Zhen Gao

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

High Reliability ◽

Computational Cost ◽

Model Performance ◽

The North ◽

The North Sea ◽

Wave Conditions ◽

Wave Models ◽

Artificial Neural Network Ann

Short-term wave forecasts are essential for the execution of marine operations. In this paper, an efficient and reliable physics-based machine learning (PBML) model is proposed to realize the multi-step-ahead forecasting of wave conditions (e.g., significant wave height Hs and peak wave period Tp). In the model, the primary variables in physics-based wave models (i.e., the wind forcing and initial wave boundary condition) are considered as inputs. Meanwhile, a machine learning algorithm (artificial neural network, ANN) is adopted to build an implicit relation between inputs and forecasted outputs of wave conditions. The computational cost of this data-driven model is obviously much lower than that of the differential-equation based physical model. A ten-year (from 2001 to 2010) dataset of every three hours at the North Sea center was used to assess the model performance in a small domain. The result reveals high reliability for one-day-ahead Hs forecasts, while that of Tp is slightly lower due to the weaker implicit relationships between the data. Overall, the PBML model can be conceived as an efficient tool for the multi-step-ahead forecasting of wave conditions, and thus has great potential for furthering assist decision-making during the execution of marine operations.

Download Full-text

A Statistical Design of Experiments Approach to Machine Learning Model Selection in Engineering Applications

Journal of Computing and Information Science in Engineering ◽

10.1115/1.4047915 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

T. Munger ◽

S. Desa

Keyword(s):

Machine Learning ◽

Model Selection ◽

Real World ◽

Ad Hoc ◽

Learning Algorithm ◽

Random Search ◽

Statistical Design ◽

Orthogonal Arrays ◽

Statistical Design Of Experiments ◽

Engineering Applications

Abstract An important but insufficiently addressed issue for machine learning in engineering applications is the task of model selection for new problems. Existing approaches to model selection generally focus on optimizing the learning algorithm and associated hyperparameters. However, in real-world engineering applications, the parameters that are external to the learning algorithm, such as feature engineering, can also have a significant impact on the performance of the model. These external parameters do not fit into most existing approaches for model selection and are therefore often studied ad hoc or not at all. In this article, we develop a statistical design of experiment (DOEs) approach to model selection based on the use of the Taguchi method. The key idea is that we use orthogonal arrays to plan a set of build-and-test experiments to study the external parameters in combination with the learning algorithm. The use of orthogonal arrays maximizes the information learned from each experiment and, therefore, enables the experimental space to be explored extremely efficiently in comparison with grid or random search methods. We demonstrated the application of the statistical DOE approach to a real-world model selection problem involving predicting service request escalation. Statistical DOE significantly reduced the number of experiments necessary to fully explore the external parameters for this problem and was able to successfully optimize the model with respect to the objective function of minimizing total cost in addition to the standard evaluation metrics such as accuracy, f-measure, and g-mean.

Download Full-text

A Clinical Score for Predicting Atrial Fibrillation in Patients with Cryptogenic Stroke or Transient Ischemic Attack

Cardiology ◽

10.1159/000476030 ◽

2017 ◽

Vol 138 (3) ◽

pp. 133-140 ◽

Cited By ~ 23

Author(s):

Calvin Kwong ◽

Albee Y. Ling ◽

Michael H. Crawford ◽

Susan X. Zhao ◽

Nigam H. Shah

Keyword(s):

Atrial Fibrillation ◽

Transient Ischemic Attack ◽

Cryptogenic Stroke ◽

Significant Risk ◽

Area Under The Curve ◽

Clinical Score ◽

Risk Groups ◽

Therapeutic Implications ◽

Using Data ◽

Ischemic Attack

Objectives: Detection of atrial fibrillation (AF) in post-cryptogenic stroke (CS) or transient ischemic attack (TIA) patients carries important therapeutic implications. Methods: To risk stratify CS/TIA patients for later development of AF, we conducted a retrospective cohort study using data from 1995 to 2015 in the Stanford Translational Research Integrated Database Environment (STRIDE). Results: Of the 9,589 adult patients (age ≥40 years) with CS/TIA included, 482 (5%) patients developed AF post CS/TIA. Of those patients, 28.4, 26.3, and 45.3% were diagnosed with AF 1-12 months, 1-3 years, and >3 years after the index CS/TIA, respectively. Age (≥75 years), obesity, congestive heart failure, hypertension, coronary artery disease, peripheral vascular disease, and valve disease are significant risk factors, with the following respective odds ratios (95% CI): 1.73 (1.39-2.16), 1.53 (1.05-2.18), 3.34 (2.61-4.28), 2.01 (1.53-2.68), 1.72 (1.35-2.19), 1.37 (1.02-1.84), and 2.05 (1.55-2.69). A risk-scoring system, i.e., the HAVOC score, was constructed using these 7 clinical variables that successfully stratify patients into 3 risk groups, with good model discrimination (area under the curve = 0.77). Conclusions: Findings from this study support the strategy of looking longer and harder for AF in post-CS/TIA patients. The HAVOC score identifies different levels of AF risk and may be used to select patients for extended rhythm monitoring.

Download Full-text