Predicting risk of stroke from lab tests using machine learning algorithms (Preprint)

2020 ◽  
Author(s):  
Eman Alanazi ◽  
Alaa Abdou ◽  
Jake Luo

UNSTRUCTURED Stroke, a cerebrovascular disease, is one of the major causes of death. It is also causing a health burden for both the patients and the healthcare systems. One of the important risk factors of stroke is health behavior which is an increasing focus of prevention. In addition, chronic diseases such as hypertension, diabetes, cardiac diseases, and asthma are potential risk factors for stroke. There are a lot of machine learning that built using predictors such as lifestyle or radiology imaging. However, there are no models built using lab tests. The aim of the study is to fill this gap by building prediction models to predict stroke from lab tests. We utilized the National Health and Nutrition Examination Survey (NHNES) data sets to develop models that would predict stroke from patient lab tests. We found that accurate and sensitive machine learning models can be created to predict stroke from lab tests. The results showed that prediction with the best tested algorithm random forest could reach the highest accuracy (ACC = 0.96) when all the attributes were used. The model proposed can be integrated with electronic health records to provide a real-time prediction of stroke from lab tests. Due to the data, we could not predict the type of stroke wither hemorrigic or ischemic. In future studies, we aim to use data that provide different types of stroke and explore the data to build a prediction model of each type.

2020 ◽  
Author(s):  
Abin Abraham ◽  
Brian L Le ◽  
Idit Kosti ◽  
Peter Straub ◽  
Digna R Velez Edwards ◽  
...  

Abstract: Identifying pregnancies at risk for preterm birth, one of the leading causes of worldwide infant mortality, has the potential to improve prenatal care. However, we lack broadly applicable methods to accurately predict preterm birth risk. The dense longitudinal information present in electronic health records (EHRs) is enabling scalable and cost-efficient risk modeling of many diseases, but EHR resources have been largely untapped in the study of pregnancy. Here, we apply machine learning to diverse data from EHRs to predict singleton preterm birth. Leveraging a large cohort of 35,282 deliveries, we find that a prediction model based on billing codes alone can predict preterm birth at 28 weeks of gestation (ROC-AUC=0.75, PR-AUC=0.40) and outperforms a comparable model trained using known risk factors (ROC-AUC=0.59, PR-AUC=0.21). Our machine learning approach is also able to accurately predict preterm birth sub-types (spontaneous vs. indicated), mode of delivery, and recurrent preterm birth. We demonstrate the portability of our approach by showing that the prediction models maintain their accuracy on a large, independent cohort (5,978 deliveries) with only a modest decrease in performance. Interpreting the features identified by the model as most informative for risk stratification demonstrates that they capture non-linear combinations of known risk factors and patterns of care. The strong performance of our approach across multiple clinical contexts and an independent cohort highlights the potential of machine learning algorithms to improve medical care during pregnancy.


2018 ◽  
Author(s):  
Robbin Bouwmeester ◽  
Lennart Martens ◽  
Sven Degroeve

AbstractLiquid chromatography is a core component of almost all mass spectrometric analyses of (bio)molecules. Because of the high-throughput nature of mass spectrometric analyses, the interpretation of these chromatographic data increasingly relies on informatics solutions that attempt to predict an analyte’s retention time. The key components of such predictive algorithms are the features these are supplies with, and the actual machine learning algorithm used to fit the model parameters.We here therefore evaluate the performance of seven machine learning algorithms on 36 distinct metabolomics data sets, using two distinct feature sets. Interestingly, the results show that no single learning algorithm performs optimally for all data sets, with different algorithm types achieving top performance for different types of analytes or different protocols. Our results can thus be used to find an optimal retention time prediction algorithm for specific analytes or protocols. Importantly, however, our results also show that blending different types of models together decreases the error on outliers, indicating that the combination of several approaches holds substantial promise for the development of more generic, high-performing algorithms.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Bum-Joo Cho ◽  
Kyoung Min Kim ◽  
Sanchir-Erdene Bilegsaikhan ◽  
Yong Joon Suh

Abstract Febrile neutropenia (FN) is one of the most concerning complications of chemotherapy, and its prediction remains difficult. This study aimed to reveal the risk factors for and build the prediction models of FN using machine learning algorithms. Medical records of hospitalized patients who underwent chemotherapy after surgery for breast cancer between May 2002 and September 2018 were selectively reviewed for development of models. Demographic, clinical, pathological, and therapeutic data were analyzed to identify risk factors for FN. Using machine learning algorithms, prediction models were developed and evaluated for performance. Of 933 selected inpatients with a mean age of 51.8 ± 10.7 years, FN developed in 409 (43.8%) patients. There was a significant difference in FN incidence according to age, staging, taxane-based regimen, and blood count 5 days after chemotherapy. The area under the curve (AUC) built based on these findings was 0.870 on the basis of logistic regression. The AUC improved by machine learning was 0.908. Machine learning improves the prediction of FN in patients undergoing chemotherapy for breast cancer compared to the conventional statistical model. In these high-risk patients, primary prophylaxis with granulocyte colony-stimulating factor could be considered.


2021 ◽  
Vol 2 (2) ◽  
pp. 40-47
Author(s):  
Sunil Kumar ◽  
Vaibhav Bhatnagar

Machine learning is one of the active fields and technologies to realize artificial intelligence (AI). The complexity of machine learning algorithms creates problems to predict the best algorithm. There are many complex algorithms in machine learning (ML) to determine the appropriate method for finding regression trends, thereby establishing the correlation association in the middle of variables is very difficult, we are going to review different types of regressions used in Machine Learning. There are mainly six types of regression model Linear, Logistic, Polynomial, Ridge, Bayesian Linear and Lasso. This paper overview the above-mentioned regression model and will try to find the comparison and suitability for Machine Learning. A data analysis prerequisite to launch an association amongst the innumerable considerations in a data set, association is essential for forecast and exploration of data. Regression Analysis is such a procedure to establish association among the datasets. The effort on this paper predominantly emphases on the diverse regression analysis model, how they binning to custom in context of different data sets in machine learning. Selection the accurate model for exploration is the most challenging assignment and hence, these models considered thoroughly in this study. In machine learning by these models in the perfect way and thru accurate data set, data exploration and forecast can provide the maximum exact outcomes.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Wan Xu ◽  
Nan-Nan Sun ◽  
Hai-Nv Gao ◽  
Zhi-Yuan Chen ◽  
Ya Yang ◽  
...  

AbstractCOVID-19 is a newly emerging infectious disease, which is generally susceptible to human beings and has caused huge losses to people's health. Acute respiratory distress syndrome (ARDS) is one of the common clinical manifestations of severe COVID-19 and it is also responsible for the current shortage of ventilators worldwide. This study aims to analyze the clinical characteristics of COVID-19 ARDS patients and establish a diagnostic system based on artificial intelligence (AI) method to predict the probability of ARDS in COVID-19 patients. We collected clinical data of 659 COVID-19 patients from 11 regions in China. The clinical characteristics of the ARDS group and no-ARDS group of COVID-19 patients were elaborately compared and both traditional machine learning algorithms and deep learning-based method were used to build the prediction models. Results indicated that the median age of ARDS patients was 56.5 years old, which was significantly older than those with non-ARDS by 7.5 years. Male and patients with BMI > 25 were more likely to develop ARDS. The clinical features of ARDS patients included cough (80.3%), polypnea (59.2%), lung consolidation (53.9%), secondary bacterial infection (30.3%), and comorbidities such as hypertension (48.7%). Abnormal biochemical indicators such as lymphocyte count, CK, NLR, AST, LDH, and CRP were all strongly related to the aggravation of ARDS. Furthermore, through various AI methods for modeling and prediction effect evaluation based on the above risk factors, decision tree achieved the best AUC, accuracy, sensitivity and specificity in identifying the mild patients who were easy to develop ARDS, which undoubtedly helped to deliver proper care and optimize use of limited resources.


2020 ◽  
Author(s):  
Wan Xu ◽  
Nan-Nan Sun ◽  
Hai-Nv Gao ◽  
Zhi-Yuan Chen ◽  
Ya Yang ◽  
...  

Abstract COVID-19 is a newly emerging infectious disease, which is generally susceptible to human beings and has caused huge losses to people's health. Acute respiratory distress syndrome (ARDS) is one of the common clinical manifestations of severe COVID-19 and it is also responsible for the current shortage of ventilators worldwide. This study aims to analyze the clinical characteristics of COVID-19 ARDS patients and establish a diagnostic system based on artificial intelligence (AI) method to predict the probability of ARDS in COVID-19 patients. We collected clinical data of 659 COVID-19 patients from 11 regions in China. The clinical characteristics of the two groups were elaborately compared and both traditional machine learning algorithms and deep learning-based methods were used to build the prediction models. Results indicated the median age of ARDS patients was 56.5 years old, which was significantly older than those with non-ARDS by 7.5 years. Male and patients with BMI>25 were more likely to develop ARDS. The clinical features of ARDS patients included cough (80.3%), polypnea (59.2%), lung consolidation (53.9%), secondary bacterial infection (30.3%), and comorbidities such as hypertension (48.7%). Abnormal biochemical indicators such as lymphocyte count, leukocyte counting, CK, NLR, AST, LDH, and CRP were all strongly related to the aggravation of ARDS. Furthermore, through various AI methods for modeling and prediction effect evaluation based on the above risk factors, decision tree achieved the best AUC, sensitivity, and specificity in identifying the mild patients who were easy to develop ARDS, which undoubtedly helps to optimize the treatment strategy, reduce mortality, and relieve the medical pressure.


2021 ◽  
Vol 16 ◽  
Author(s):  
Jun Wu ◽  
Guoping Yang ◽  
Lulu Qu ◽  
Nan Han

Background: with the increasing quality of life of people, people begin to have more time and energy to pay attention to their own health problems. Among them, diabetes, as one of the most common and fastest-growing diseases, has attracted widespread attention from experts in bioinformatics. People of different ages all over the world suffer from diabetes which can shorten the life span of patients. Diabetes has a significant impact on human health, so that the accuracy of the initial diagnosis becomes essential. Diabetes can bring some serious complications, especially in the elderly, such as cardiovascular and cerebrovascular diseases, stroke, and multiple organ damage. The initial diagnosis of diabetes can reduce the possibility of deterioration. Identifying and analyzing potential risk factors for different physical attributes can help diagnose the prevalence of diabetes. The more accurate the prevalence, the more likely it is to reduce the incidence of complications. Methods: In this paper, we use the open source NHANES data set to analyze and determine potential risk factors relevant to diabetes by an improved version of Logistic Regression, SVM, and other improved machine learning algorithms. Results: Experimental results show that the improved version of Random Forest has the best effect, with a classification accuracy of 92%, and it can be found that age, blood-related diabetes, high blood pressure, cholesterol and BMI are the most important risk factors related to diabetes. Conclusion: Through the proposed method of machine learning, we can cope with the class imbalance and outlier detection problems.


2018 ◽  
Author(s):  
Liyan Pan ◽  
Guangjian Liu ◽  
Xiaojian Mao ◽  
Huixian Li ◽  
Jiexin Zhang ◽  
...  

BACKGROUND Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis—gonadotropin-releasing hormone (GnRH)–stimulation test or GnRH analogue (GnRHa)–stimulation test—is expensive and makes patients uncomfortable due to the need for repeated blood sampling. OBJECTIVE We aimed to combine multiple CPP–related features and construct machine learning models to predict response to the GnRHa-stimulation test. METHODS In this retrospective study, we analyzed clinical and laboratory data of 1757 girls who underwent a GnRHa test in order to develop XGBoost and random forest classifiers for prediction of response to the GnRHa test. The local interpretable model-agnostic explanations (LIME) algorithm was used with the black-box classifiers to increase their interpretability. We measured sensitivity, specificity, and area under receiver operating characteristic (AUC) of the models. RESULTS Both the XGBoost and random forest models achieved good performance in distinguishing between positive and negative responses, with the AUC ranging from 0.88 to 0.90, sensitivity ranging from 77.91% to 77.94%, and specificity ranging from 84.32% to 87.66%. Basal serum luteinizing hormone, follicle-stimulating hormone, and insulin-like growth factor-I levels were found to be the three most important factors. In the interpretable models of LIME, the abovementioned variables made high contributions to the prediction probability. CONCLUSIONS The prediction models we developed can help diagnose CPP and may be used as a prescreening tool before the GnRHa-stimulation test.


2020 ◽  
Author(s):  
Neil Kale

BACKGROUND Despite worldwide efforts to develop an effective COVID vaccine, it is quite evident that initial supplies will be limited. Therefore, it is important to develop methods that will ensure that the COVID vaccine is allocated to the people who are at major risk until there is a sufficient global supply. OBJECTIVE The purpose of this study was to develop a machine-learning tool that could be applied to assess the risk in Massachusetts towns based on community-wide social, medical, and lifestyle risk factors. METHODS I compiled Massachusetts town data for 29 potential risk factors, such as the prevalence of preexisting comorbid conditions like COPD and social factors such as racial composition, and implemented logistic regression to predict the amount of COVID cases in each town. RESULTS Of the 29 factors, 14 were found to be significant (p < 0.1) indicators: poverty, food insecurity, lack of high school education, lack of health insurance coverage, premature mortality, population, population density, recent population growth, Asian percentage, high-occupancy housing, and preexisting prevalence of cancer, COPD, overweightness, and heart attacks. The machine-learning approach is 80% accurate in the state of Massachusetts and finds the 9 highest risk communities: Lynn, Brockton, Revere, Randolph, Lowell, New Bedford, Everett, Waltham, and Fitchburg. The 5 most at-risk counties are Suffolk, Middlesex, Bristol, Norfolk, and Plymouth. CONCLUSIONS With appropriate data, the tool could evaluate risk in other communities, or even enumerate individual patient susceptibility. A ranking of communities by risk may help policymakers ensure equitable allocation of limited doses of the COVID vaccine.


2021 ◽  
Vol 13 (13) ◽  
pp. 2433
Author(s):  
Shu Yang ◽  
Fengchao Peng ◽  
Sibylle von Löwis ◽  
Guðrún Nína Petersen ◽  
David Christian Finger

Doppler lidars are used worldwide for wind monitoring and recently also for the detection of aerosols. Automatic algorithms that classify the lidar signals retrieved from lidar measurements are very useful for the users. In this study, we explore the value of machine learning to classify backscattered signals from Doppler lidars using data from Iceland. We combined supervised and unsupervised machine learning algorithms with conventional lidar data processing methods and trained two models to filter noise signals and classify Doppler lidar observations into different classes, including clouds, aerosols and rain. The results reveal a high accuracy for noise identification and aerosols and clouds classification. However, precipitation detection is underestimated. The method was tested on data sets from two instruments during different weather conditions, including three dust storms during the summer of 2019. Our results reveal that this method can provide an efficient, accurate and real-time classification of lidar measurements. Accordingly, we conclude that machine learning can open new opportunities for lidar data end-users, such as aviation safety operators, to monitor dust in the vicinity of airports.


Sign in / Sign up

Export Citation Format

Share Document