scholarly journals Machine Learning Based Risk Prediction for Major Adverse Cardiovascular Events

Author(s):  
Michael Schrempf ◽  
Diether Kramer ◽  
Stefanie Jauk ◽  
Sai P. K. Veeranki ◽  
Werner Leodolter ◽  
...  

Background: Patients with major adverse cardiovascular events (MACE) such as myocardial infarction or stroke suffer from frequent hospitalizations and have high mortality rates. By identifying patients at risk at an early stage, MACE can be prevented with the right interventions. Objectives: The aim of this study was to develop machine learning-based models for the 5-year risk prediction of MACE. Methods: The data used for modelling included electronic medical records of more than 128,000 patients including 29,262 patients with MACE. A feature selection based on filter and embedded methods resulted in 826 features for modelling. Different machine learning methods were used for modelling on the training data. Results: A random forest model achieved the best calibration and discriminative performance on a separate test data set with an AUROC of 0.88. Conclusion: The developed risk prediction models achieved an excellent performance in the test data. Future research is needed to determine the performance of these models and their clinical benefit in prospective settings.

2021 ◽  
Vol 5 (Supplement_1) ◽  
pp. A417-A418
Author(s):  
Amanda Yun Rui Lam ◽  
Min Min Chan ◽  
David Carmody ◽  
Ming Ming Teh ◽  
Yong Mong Bee ◽  
...  

Abstract Background: South-East Asia has seen a dramatic increase in type 2 diabetes (T2D). Risk prediction models for Major adverse cardiovascular events (MACE) identify patients who may benefit most from intensive prevention strategies. Existing risk prediction models for T2D were developed mainly in Caucasian populations, limiting their generalizability to Asian populations. We developed a Lasso-Cox regression model to predict the 5-year risk of incident MACE in Asian patients with T2DM using data from the largest diabetes registry in Singapore. Methodology: The diabetes registry contained public healthcare data from 9 primary healthcare centers, 4 hospitals and 3 national specialty centers. Data from 120,131 T2D subjects without MACE at baseline, from 2008 to 2018, were used for model development and validation. Patients with less than 5 years of follow-up data were excluded. Lasso-Cox, a semi-parametric variant of the Cox Proportional Hazard Model with l1-regularization, was used to predict individual survival distribution of incident MACE. A total of 69 features within electronic health records, including demographic data, vital signs, laboratory tests, and prescriptions for blood pressure, lipid and glucose-lowering medication were supplied to the model. Regression shrinkage and selection via the lasso method was used to identify variables associated with incident MACE. Identified variables were used to generate individual survival probability curves. Incident MACE was defined as the first occurrence of nonfatal myocardial infarction, nonfatal stroke, and CV disease-related death. Results: A total of 12,535 (10.4%) subjects developed MACE between 2008 and 2018. Model performance was evaluated by time-dependent concordance index and Brier score at 1, 2 and 5 years. The results of 5-fold cross validation shows that the model displayed good discrimination, achieving time-dependent C-statistics of 0.746±0.005, 0.742±0.003 and 0.738±0.002 at 1, 2 and 5 years respectively. The model demonstrated low Brier scores of 0.0355±0.0004, 0.0601±0.0011, 0.104±0.004 at 1, 2 and 5 years respectively, indicating good calibration. Factors most predictive of MACE were age and a history of hypertension and hyperlipidemia. Conclusions: We have developed a risk prediction model for MACE in Asian T2D using a large Singaporean T2D cohort, which can be used to support clinical decision-making. The individual survival probability estimates achieve an average C-statistics of 0.742 and are well-calibrated at 1, 2 and 5 years.


2020 ◽  
Author(s):  
Sina Faizollahzadeh Ardabili ◽  
Amir Mosavi ◽  
Pedram Ghamisi ◽  
Filip Ferdinand ◽  
Annamaria R. Varkonyi-Koczy ◽  
...  

Several outbreak prediction models for COVID-19 are being used by officials around the world to make informed-decisions and enforce relevant control measures. Among the standard models for COVID-19 global pandemic prediction, simple epidemiological and statistical models have received more attention by authorities, and they are popular in the media. Due to a high level of uncertainty and lack of essential data, standard models have shown low accuracy for long-term prediction. Although the literature includes several attempts to address this issue, the essential generalization and robustness abilities of existing models needs to be improved. This paper presents a comparative analysis of machine learning and soft computing models to predict the COVID-19 outbreak as an alternative to SIR and SEIR models. Among a wide range of machine learning models investigated, two models showed promising results (i.e., multi-layered perceptron, MLP, and adaptive network-based fuzzy inference system, ANFIS). Based on the results reported here, and due to the highly complex nature of the COVID-19 outbreak and variation in its behavior from nation-to-nation, this study suggests machine learning as an effective tool to model the outbreak. This paper provides an initial benchmarking to demonstrate the potential of machine learning for future research. Paper further suggests that real novelty in outbreak prediction can be realized through integrating machine learning and SEIR models.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Young Jae Kim ◽  
Jang Pyo Bae ◽  
Jun-Won Chung ◽  
Dong Kyun Park ◽  
Kwang Gi Kim ◽  
...  

AbstractWhile colorectal cancer is known to occur in the gastrointestinal tract. It is the third most common form of cancer of 27 major types of cancer in South Korea and worldwide. Colorectal polyps are known to increase the potential of developing colorectal cancer. Detected polyps need to be resected to reduce the risk of developing cancer. This research improved the performance of polyp classification through the fine-tuning of Network-in-Network (NIN) after applying a pre-trained model of the ImageNet database. Random shuffling is performed 20 times on 1000 colonoscopy images. Each set of data are divided into 800 images of training data and 200 images of test data. An accuracy evaluation is performed on 200 images of test data in 20 experiments. Three compared methods were constructed from AlexNet by transferring the weights trained by three different state-of-the-art databases. A normal AlexNet based method without transfer learning was also compared. The accuracy of the proposed method was higher in statistical significance than the accuracy of four other state-of-the-art methods, and showed an 18.9% improvement over the normal AlexNet based method. The area under the curve was approximately 0.930 ± 0.020, and the recall rate was 0.929 ± 0.029. An automatic algorithm can assist endoscopists in identifying polyps that are adenomatous by considering a high recall rate and accuracy. This system can enable the timely resection of polyps at an early stage.


2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A62-A62
Author(s):  
Dattatreya Mellacheruvu ◽  
Rachel Pyke ◽  
Charles Abbott ◽  
Nick Phillips ◽  
Sejal Desai ◽  
...  

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.


2021 ◽  
Author(s):  
Octavian Dumitru ◽  
Gottfried Schwarz ◽  
Mihai Datcu ◽  
Dongyang Ao ◽  
Zhongling Huang ◽  
...  

<p>During the last years, much progress has been reached with machine learning algorithms. Among the typical application fields of machine learning are many technical and commercial applications as well as Earth science analyses, where most often indirect and distorted detector data have to be converted to well-calibrated scientific data that are a prerequisite for a correct understanding of the desired physical quantities and their relationships.</p><p>However, the provision of sufficient calibrated data is not enough for the testing, training, and routine processing of most machine learning applications. In principle, one also needs a clear strategy for the selection of necessary and useful training data and an easily understandable quality control of the finally desired parameters.</p><p>At a first glance, one could guess that this problem could be solved by a careful selection of representative test data covering many typical cases as well as some counterexamples. Then these test data can be used for the training of the internal parameters of a machine learning application. At a second glance, however, many researchers found out that a simple stacking up of plain examples is not the best choice for many scientific applications.</p><p>To get improved machine learning results, we concentrated on the analysis of satellite images depicting the Earth’s surface under various conditions such as the selected instrument type, spectral bands, and spatial resolution. In our case, such data are routinely provided by the freely accessible European Sentinel satellite products (e.g., Sentinel-1, and Sentinel-2). Our basic work then included investigations of how some additional processing steps – to be linked with the selected training data – can provide better machine learning results.</p><p>To this end, we analysed and compared three different approaches to find out machine learning strategies for the joint selection and processing of training data for our Earth observation images:</p><ul><li>One can optimize the training data selection by adapting the data selection to the specific instrument, target, and application characteristics [1].</li> <li>As an alternative, one can dynamically generate new training parameters by Generative Adversarial Networks. This is comparable to the role of a sparring partner in boxing [2].</li> <li>One can also use a hybrid semi-supervised approach for Synthetic Aperture Radar images with limited labelled data. The method is split in: polarimetric scattering classification, topic modelling for scattering labels, unsupervised constraint learning, and supervised label prediction with constraints [3].</li> </ul><p>We applied these strategies in the ExtremeEarth sea-ice monitoring project (http://earthanalytics.eu/). As a result, we can demonstrate for which application cases these three strategies will provide a promising alternative to a simple conventional selection of available training data.</p><p>[1] C.O. Dumitru et. al, “Understanding Satellite Images: A Data Mining Module for Sentinel Images”, Big Earth Data, 2020, 4(4), pp. 367-408.</p><p>[2] D. Ao et. al., “Dialectical GAN for SAR Image Translation: From Sentinel-1 to TerraSAR-X”, Remote Sensing, 2018, 10(10), pp. 1-23.</p><p>[3] Z. Huang, et. al., "HDEC-TFA: An Unsupervised Learning Approach for Discovering Physical Scattering Properties of Single-Polarized SAR Images", IEEE Transactions on Geoscience and Remote Sensing, 2020, pp.1-18.</p>


Author(s):  
Nghia H Nguyen ◽  
Dominic Picetti ◽  
Parambir S Dulai ◽  
Vipul Jairath ◽  
William J Sandborn ◽  
...  

Abstract Background and Aims There is increasing interest in machine learning-based prediction models in inflammatory bowel diseases (IBD). We synthesized and critically appraised studies comparing machine learning vs. traditional statistical models, using routinely available clinical data for risk prediction in IBD. Methods Through a systematic review till January 1, 2021, we identified cohort studies that derived and/or validated machine learning models, based on routinely collected clinical data in patients with IBD, to predict the risk of harboring or developing adverse clinical outcomes, and reported its predictive performance against a traditional statistical model for the same outcome. We appraised the risk of bias in these studies using the Prediction model Risk of Bias ASsessment (PROBAST) tool. Results We included 13 studies on machine learning-based prediction models in IBD encompassing themes of predicting treatment response to biologics and thiopurines, predicting longitudinal disease activity and complications and outcomes in patients with acute severe ulcerative colitis. The most common machine learnings models used were tree-based algorithms, which are classification approaches achieved through supervised learning. Machine learning models outperformed traditional statistical models in risk prediction. However, most models were at high risk of bias, and only one was externally validated. Conclusions Machine learning-based prediction models based on routinely collected data generally perform better than traditional statistical models in risk prediction in IBD, though frequently have high risk of bias. Future studies examining these approaches are warranted, with special focus on external validation and clinical applicability.


Author(s):  
Chenxi Huang ◽  
Shu-Xia Li ◽  
César Caraballo ◽  
Frederick A. Masoudi ◽  
John S. Rumsfeld ◽  
...  

Background: New methods such as machine learning techniques have been increasingly used to enhance the performance of risk predictions for clinical decision-making. However, commonly reported performance metrics may not be sufficient to capture the advantages of these newly proposed models for their adoption by health care professionals to improve care. Machine learning models often improve risk estimation for certain subpopulations that may be missed by these metrics. Methods and Results: This article addresses the limitations of commonly reported metrics for performance comparison and proposes additional metrics. Our discussions cover metrics related to overall performance, discrimination, calibration, resolution, reclassification, and model implementation. Models for predicting acute kidney injury after percutaneous coronary intervention are used to illustrate the use of these metrics. Conclusions: We demonstrate that commonly reported metrics may not have sufficient sensitivity to identify improvement of machine learning models and propose the use of a comprehensive list of performance metrics for reporting and comparing clinical risk prediction models.


Author(s):  
Yanxiang Yu ◽  
◽  
Chicheng Xu ◽  
Siddharth Misra ◽  
Weichang Li ◽  
...  

Compressional and shear sonic traveltime logs (DTC and DTS, respectively) are crucial for subsurface characterization and seismic-well tie. However, these two logs are often missing or incomplete in many oil and gas wells. Therefore, many petrophysical and geophysical workflows include sonic log synthetization or pseudo-log generation based on multivariate regression or rock physics relations. Started on March 1, 2020, and concluded on May 7, 2020, the SPWLA PDDA SIG hosted a contest aiming to predict the DTC and DTS logs from seven “easy-to-acquire” conventional logs using machine-learning methods (GitHub, 2020). In the contest, a total number of 20,525 data points with half-foot resolution from three wells was collected to train regression models using machine-learning techniques. Each data point had seven features, consisting of the conventional “easy-to-acquire” logs: caliper, neutron porosity, gamma ray (GR), deep resistivity, medium resistivity, photoelectric factor, and bulk density, respectively, as well as two sonic logs (DTC and DTS) as the target. The separate data set of 11,089 samples from a fourth well was then used as the blind test data set. The prediction performance of the model was evaluated using root mean square error (RMSE) as the metric, shown in the equation below: RMSE=sqrt(1/2*1/m* [∑_(i=1)^m▒〖(〖DTC〗_pred^i-〖DTC〗_true^i)〗^2 + 〖(〖DTS〗_pred^i-〖DTS〗_true^i)〗^2 ] In the benchmark model, (Yu et al., 2020), we used a Random Forest regressor and conducted minimal preprocessing to the training data set; an RMSE score of 17.93 was achieved on the test data set. The top five models from the contest, on average, beat the performance of our benchmark model by 27% in the RMSE score. In the paper, we will review these five solutions, including preprocess techniques and different machine-learning models, including neural network, long short-term memory (LSTM), and ensemble trees. We found that data cleaning and clustering were critical for improving the performance in all models.


2020 ◽  
Author(s):  
Amir Mosavi

Several epidemiological models are being used around the world to project the number of infected individuals and the mortality rates of the COVID-19 outbreak. Advancing accurate prediction models is of utmost importance to take proper actions. Due to a high level of uncertainty or even lack of essential data, the standard epidemiological models have been challenged regarding the delivery of higher accuracy for long-term prediction. As an alternative to the susceptible-infected-resistant (SIR)-based models, this study proposes a hybrid machine learning approach to predict the COVID-19 and we exemplify its potential using data from Hungary. The hybrid machine learning methods of adaptive network-based fuzzy inference system (ANFIS) and multi-layered perceptron-imperialist competitive algorithm (MLP-ICA) are used to predict time series of infected individuals and mortality rate. The models predict that by late May, the outbreak and the total morality will drop substantially. The validation is performed for nine days with promising results, which confirms the model accuracy. It is expected that the model maintains its accuracy as long as no significant interruption occurs. Based on the results reported here, and due to the complex nature of the COVID-19 outbreak and variation in its behavior from nation-to-nation, this study suggests machine learning as an effective tool to model the outbreak. This paper provides an initial benchmarking to demonstrate the potential of machine learning for future research.


When pancreas fails to secrete sufficient insulin in the human body, the glucose level in blood either becomes too high or too low. This fluctuation in glucose level affects different body organs such as kidney, brain, and eye. When the complications start appearing in the eyes due to Diabetic Mellitus (DM), it is called Diabetic Retinopathy (DR). DR can be categorized in several classes based on the severity, it can be Microaneurysms (ME), Haemorrhages (HE), Hard and Soft Exudates (EX and SE). DR is a slow start process that starts with very mild symptoms, becomes moderate with the time and results in complete vision loss, if not detected on time. Early-stage detection may greatly bolster in vision loss. However, it is impassable to detect the symptoms of DR with naked eyes. Ophthalmologist harbor to the several approaches and algorithm which makes use of different Machine Learning (ML) methods and classifiers to overcome this disease. The burgeoning insistence of Convolutional Neural Network (CNN) and their advancement in extracting features from different fundus images captivate several researchers to strive on it. Transfer Learning (TL) techniques help to use pre-trained CNN on a dataset that has finite training data, especially that in under developing countries. In this work, we propose several CNN architecture along with distinct classifiers which segregate the different lesions (ME and EX) in DR images with very eye-catching accuracies.


Sign in / Sign up

Export Citation Format

Share Document