Machine Learning Based Risk Prediction for Major Adverse Cardiovascular Events

Navigating Healthcare Through Challenging Times - Studies in Health Technology and Informatics ◽

10.3233/shti210100 ◽

2021 ◽

Author(s):

Michael Schrempf ◽

Diether Kramer ◽

Stefanie Jauk ◽

Sai P. K. Veeranki ◽

Werner Leodolter ◽

...

Keyword(s):

Machine Learning ◽

Risk Prediction ◽

Test Data ◽

Cardiovascular Events ◽

Prediction Models ◽

Early Stage ◽

Training Data ◽

Major Adverse Cardiovascular Events ◽

Future Research ◽

Patients At Risk

Background: Patients with major adverse cardiovascular events (MACE) such as myocardial infarction or stroke suffer from frequent hospitalizations and have high mortality rates. By identifying patients at risk at an early stage, MACE can be prevented with the right interventions. Objectives: The aim of this study was to develop machine learning-based models for the 5-year risk prediction of MACE. Methods: The data used for modelling included electronic medical records of more than 128,000 patients including 29,262 patients with MACE. A feature selection based on filter and embedded methods resulted in 826 features for modelling. Different machine learning methods were used for modelling on the training data. Results: A random forest model achieved the best calibration and discriminative performance on a separate test data set with an AUROC of 0.88. Conclusion: The developed risk prediction models achieved an excellent performance in the test data. Future research is needed to determine the performance of these models and their clinical benefit in prospective settings.

Download Full-text

Predicting Major Adverse Cardiovascular Events in Asian Type 2 Diabetes Patients With Lasso-Cox Regression

Journal of the Endocrine Society ◽

10.1210/jendso/bvab048.852 ◽

2021 ◽

Vol 5 (Supplement_1) ◽

pp. A417-A418

Author(s):

Amanda Yun Rui Lam ◽

Min Min Chan ◽

David Carmody ◽

Ming Ming Teh ◽

Yong Mong Bee ◽

...

Keyword(s):

Risk Prediction ◽

Cardiovascular Events ◽

Prediction Models ◽

Cox Regression ◽

Time Dependent ◽

Major Adverse Cardiovascular Events ◽

Risk Prediction Models ◽

Diabetes Registry ◽

Individual Survival

Abstract Background: South-East Asia has seen a dramatic increase in type 2 diabetes (T2D). Risk prediction models for Major adverse cardiovascular events (MACE) identify patients who may benefit most from intensive prevention strategies. Existing risk prediction models for T2D were developed mainly in Caucasian populations, limiting their generalizability to Asian populations. We developed a Lasso-Cox regression model to predict the 5-year risk of incident MACE in Asian patients with T2DM using data from the largest diabetes registry in Singapore. Methodology: The diabetes registry contained public healthcare data from 9 primary healthcare centers, 4 hospitals and 3 national specialty centers. Data from 120,131 T2D subjects without MACE at baseline, from 2008 to 2018, were used for model development and validation. Patients with less than 5 years of follow-up data were excluded. Lasso-Cox, a semi-parametric variant of the Cox Proportional Hazard Model with l1-regularization, was used to predict individual survival distribution of incident MACE. A total of 69 features within electronic health records, including demographic data, vital signs, laboratory tests, and prescriptions for blood pressure, lipid and glucose-lowering medication were supplied to the model. Regression shrinkage and selection via the lasso method was used to identify variables associated with incident MACE. Identified variables were used to generate individual survival probability curves. Incident MACE was defined as the first occurrence of nonfatal myocardial infarction, nonfatal stroke, and CV disease-related death. Results: A total of 12,535 (10.4%) subjects developed MACE between 2008 and 2018. Model performance was evaluated by time-dependent concordance index and Brier score at 1, 2 and 5 years. The results of 5-fold cross validation shows that the model displayed good discrimination, achieving time-dependent C-statistics of 0.746±0.005, 0.742±0.003 and 0.738±0.002 at 1, 2 and 5 years respectively. The model demonstrated low Brier scores of 0.0355±0.0004, 0.0601±0.0011, 0.104±0.004 at 1, 2 and 5 years respectively, indicating good calibration. Factors most predictive of MACE were age and a history of hypertension and hyperlipidemia. Conclusions: We have developed a risk prediction model for MACE in Asian T2D using a large Singaporean T2D cohort, which can be used to support clinical decision-making. The individual survival probability estimates achieve an average C-statistics of 0.742 and are well-calibrated at 1, 2 and 5 years.

Download Full-text

COVID-19 Outbreak Prediction with Machine Learning

10.34055/osf.io/xr4js ◽

2020 ◽

Author(s):

Sina Faizollahzadeh Ardabili ◽

Amir Mosavi ◽

Pedram Ghamisi ◽

Filip Ferdinand ◽

Annamaria R. Varkonyi-Koczy ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Fuzzy Inference ◽

Control Measures ◽

Future Research ◽

Complex Nature ◽

Inference System ◽

Wide Range ◽

Standard Models ◽

High Level

Several outbreak prediction models for COVID-19 are being used by officials around the world to make informed-decisions and enforce relevant control measures. Among the standard models for COVID-19 global pandemic prediction, simple epidemiological and statistical models have received more attention by authorities, and they are popular in the media. Due to a high level of uncertainty and lack of essential data, standard models have shown low accuracy for long-term prediction. Although the literature includes several attempts to address this issue, the essential generalization and robustness abilities of existing models needs to be improved. This paper presents a comparative analysis of machine learning and soft computing models to predict the COVID-19 outbreak as an alternative to SIR and SEIR models. Among a wide range of machine learning models investigated, two models showed promising results (i.e., multi-layered perceptron, MLP, and adaptive network-based fuzzy inference system, ANFIS). Based on the results reported here, and due to the highly complex nature of the COVID-19 outbreak and variation in its behavior from nation-to-nation, this study suggests machine learning as an effective tool to model the outbreak. This paper provides an initial benchmarking to demonstrate the potential of machine learning for future research. Paper further suggests that real novelty in outbreak prediction can be realized through integrating machine learning and SEIR models.

Download Full-text

New polyp image classification technique using transfer learning of network-in-network structure in endoscopic images

Scientific Reports ◽

10.1038/s41598-021-83199-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Young Jae Kim ◽

Jang Pyo Bae ◽

Jun-Won Chung ◽

Dong Kyun Park ◽

Kwang Gi Kim ◽

...

Keyword(s):

Colorectal Cancer ◽

Transfer Learning ◽

Test Data ◽

State Of The Art ◽

Early Stage ◽

Statistical Significance ◽

Recall Rate ◽

Training Data ◽

Fine Tuning ◽

Accuracy Evaluation

AbstractWhile colorectal cancer is known to occur in the gastrointestinal tract. It is the third most common form of cancer of 27 major types of cancer in South Korea and worldwide. Colorectal polyps are known to increase the potential of developing colorectal cancer. Detected polyps need to be resected to reduce the risk of developing cancer. This research improved the performance of polyp classification through the fine-tuning of Network-in-Network (NIN) after applying a pre-trained model of the ImageNet database. Random shuffling is performed 20 times on 1000 colonoscopy images. Each set of data are divided into 800 images of training data and 200 images of test data. An accuracy evaluation is performed on 200 images of test data in 20 experiments. Three compared methods were constructed from AlexNet by transferring the weights trained by three different state-of-the-art databases. A normal AlexNet based method without transfer learning was also compared. The accuracy of the proposed method was higher in statistical significance than the accuracy of four other state-of-the-art methods, and showed an 18.9% improvement over the normal AlexNet based method. The area under the curve was approximately 0.930 ± 0.020, and the recall rate was 0.929 ± 0.029. An automatic algorithm can assist endoscopists in identifying polyps that are adenomatous by considering a high recall rate and accuracy. This system can enable the timely resection of polyps at an early stage.

Download Full-text

57 Precision neoantigen discovery using novel algorithms and expanded HLA-ligandome datasets

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-sitc2020.0057 ◽

2020 ◽

Vol 8 (Suppl 3) ◽

pp. A62-A62

Author(s):

Dattatreya Mellacheruvu ◽

Rachel Pyke ◽

Charles Abbott ◽

Nick Phillips ◽

Sejal Desai ◽

...

Keyword(s):

Machine Learning ◽

Cell Lines ◽

Antigen Processing ◽

Large Scale ◽

Prediction Models ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Training Data ◽

High Quality ◽

Tissue Samples

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.

Download Full-text

Improved Training for Machine Learning: The Additional Potential of Innovative Algorithmic Approaches.

10.5194/egusphere-egu21-4683 ◽

2021 ◽

Author(s):

Octavian Dumitru ◽

Gottfried Schwarz ◽

Mihai Datcu ◽

Dongyang Ao ◽

Zhongling Huang ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Test Data ◽

Satellite Images ◽

Training Data ◽

Data Selection ◽

Generative Adversarial Networks ◽

Radar Images ◽

Basic Work ◽

Selection Of

During the last years, much progress has been reached with machine learning algorithms. Among the typical application fields of machine learning are many technical and commercial applications as well as Earth science analyses, where most often indirect and distorted detector data have to be converted to well-calibrated scientific data that are a prerequisite for a correct understanding of the desired physical quantities and their relationships.However, the provision of sufficient calibrated data is not enough for the testing, training, and routine processing of most machine learning applications. In principle, one also needs a clear strategy for the selection of necessary and useful training data and an easily understandable quality control of the finally desired parameters.At a first glance, one could guess that this problem could be solved by a careful selection of representative test data covering many typical cases as well as some counterexamples. Then these test data can be used for the training of the internal parameters of a machine learning application. At a second glance, however, many researchers found out that a simple stacking up of plain examples is not the best choice for many scientific applications.To get improved machine learning results, we concentrated on the analysis of satellite images depicting the Earth&#8217;s surface under various conditions such as the selected instrument type, spectral bands, and spatial resolution. In our case, such data are routinely provided by the freely accessible European Sentinel satellite products (e.g., Sentinel-1, and Sentinel-2). Our basic work then included investigations of how some additional processing steps &#8211; to be linked with the selected training data &#8211; can provide better machine learning results.To this end, we analysed and compared three different approaches to find out machine learning strategies for the joint selection and processing of training data for our Earth observation images:<ul><li>One can optimize the training data selection by adapting the data selection to the specific instrument, target, and application characteristics [1].</li> <li>As an alternative, one can dynamically generate new training parameters by Generative Adversarial Networks. This is comparable to the role of a sparring partner in boxing [2].</li> <li>One can also use a hybrid semi-supervised approach for Synthetic Aperture Radar images with limited labelled data. The method is split in: polarimetric scattering classification, topic modelling for scattering labels, unsupervised constraint learning, and supervised label prediction with constraints [3].</li> </ul>We applied these strategies in the ExtremeEarth sea-ice monitoring project (http://earthanalytics.eu/). As a result, we can demonstrate for which application cases these three strategies will provide a promising alternative to a simple conventional selection of available training data.[1] C.O. Dumitru et. al, &#8220;Understanding Satellite Images: A Data Mining Module for Sentinel Images&#8221;, Big Earth Data, 2020, 4(4), pp. 367-408.[2] D. Ao et. al., &#8220;Dialectical GAN for SAR Image Translation: From Sentinel-1 to TerraSAR-X&#8221;, Remote Sensing, 2018, 10(10), pp. 1-23.[3] Z. Huang, et. al., "HDEC-TFA: An Unsupervised Learning Approach for Discovering Physical Scattering Properties of Single-Polarized SAR Images", IEEE Transactions on Geoscience and Remote Sensing, 2020, pp.1-18.

Download Full-text

Machine Learning-based Prediction Models for Diagnosis and Prognosis in Inflammatory Bowel Diseases: A Systematic Review

Journal of Crohn s and Colitis ◽

10.1093/ecco-jcc/jjab155 ◽

2021 ◽

Author(s):

Nghia H Nguyen ◽

Dominic Picetti ◽

Parambir S Dulai ◽

Vipul Jairath ◽

William J Sandborn ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Risk Prediction ◽

Statistical Models ◽

Prediction Models ◽

Risk Of Bias ◽

Learning Models ◽

Bowel Diseases ◽

Inflammatory Bowel ◽

Machine Learning Models

Abstract Background and Aims There is increasing interest in machine learning-based prediction models in inflammatory bowel diseases (IBD). We synthesized and critically appraised studies comparing machine learning vs. traditional statistical models, using routinely available clinical data for risk prediction in IBD. Methods Through a systematic review till January 1, 2021, we identified cohort studies that derived and/or validated machine learning models, based on routinely collected clinical data in patients with IBD, to predict the risk of harboring or developing adverse clinical outcomes, and reported its predictive performance against a traditional statistical model for the same outcome. We appraised the risk of bias in these studies using the Prediction model Risk of Bias ASsessment (PROBAST) tool. Results We included 13 studies on machine learning-based prediction models in IBD encompassing themes of predicting treatment response to biologics and thiopurines, predicting longitudinal disease activity and complications and outcomes in patients with acute severe ulcerative colitis. The most common machine learnings models used were tree-based algorithms, which are classification approaches achieved through supervised learning. Machine learning models outperformed traditional statistical models in risk prediction. However, most models were at high risk of bias, and only one was externally validated. Conclusions Machine learning-based prediction models based on routinely collected data generally perform better than traditional statistical models in risk prediction in IBD, though frequently have high risk of bias. Future studies examining these approaches are warranted, with special focus on external validation and clinical applicability.

Download Full-text

Performance Metrics for the Comparative Analysis of Clinical Risk Prediction Models Employing Machine Learning

Circulation Cardiovascular Quality and Outcomes ◽

10.1161/circoutcomes.120.007526 ◽

2021 ◽

Author(s):

Chenxi Huang ◽

Shu-Xia Li ◽

César Caraballo ◽

Frederick A. Masoudi ◽

John S. Rumsfeld ◽

...

Keyword(s):

Machine Learning ◽

Risk Prediction ◽

Health Care Professionals ◽

Clinical Decision Making ◽

Performance Metrics ◽

Prediction Models ◽

Learning Models ◽

Risk Prediction Models ◽

Clinical Risk ◽

Machine Learning Models

Background: New methods such as machine learning techniques have been increasingly used to enhance the performance of risk predictions for clinical decision-making. However, commonly reported performance metrics may not be sufficient to capture the advantages of these newly proposed models for their adoption by health care professionals to improve care. Machine learning models often improve risk estimation for certain subpopulations that may be missed by these metrics. Methods and Results: This article addresses the limitations of commonly reported metrics for performance comparison and proposes additional metrics. Our discussions cover metrics related to overall performance, discrimination, calibration, resolution, reclassification, and model implementation. Models for predicting acute kidney injury after percutaneous coronary intervention are used to illustrate the use of these metrics. Conclusions: We demonstrate that commonly reported metrics may not have sufficient sensitivity to identify improvement of machine learning models and propose the use of a comprehensive list of performance metrics for reporting and comparing clinical risk prediction models.

Download Full-text

Synthetic Sonic Log Generation With Machine Learning: A Contest Summary From Five Methods

Petrophysics – The SPWLA Journal of Formation Evaluation and Reservoir Description ◽

10.30632/pjv62n4-2021a4 ◽

2021 ◽

Vol 62 (4) ◽

pp. 393-406

Author(s):

Yanxiang Yu ◽

◽

Chicheng Xu ◽

Siddharth Misra ◽

Weichang Li ◽

...

Keyword(s):

Machine Learning ◽

Test Data ◽

Short Term Memory ◽

Rock Physics ◽

Training Data ◽

Machine Learning Techniques ◽

Blind Test ◽

Data Set ◽

Benchmark Model ◽

Sonic Log

Compressional and shear sonic traveltime logs (DTC and DTS, respectively) are crucial for subsurface characterization and seismic-well tie. However, these two logs are often missing or incomplete in many oil and gas wells. Therefore, many petrophysical and geophysical workflows include sonic log synthetization or pseudo-log generation based on multivariate regression or rock physics relations. Started on March 1, 2020, and concluded on May 7, 2020, the SPWLA PDDA SIG hosted a contest aiming to predict the DTC and DTS logs from seven “easy-to-acquire” conventional logs using machine-learning methods (GitHub, 2020). In the contest, a total number of 20,525 data points with half-foot resolution from three wells was collected to train regression models using machine-learning techniques. Each data point had seven features, consisting of the conventional “easy-to-acquire” logs: caliper, neutron porosity, gamma ray (GR), deep resistivity, medium resistivity, photoelectric factor, and bulk density, respectively, as well as two sonic logs (DTC and DTS) as the target. The separate data set of 11,089 samples from a fourth well was then used as the blind test data set. The prediction performance of the model was evaluated using root mean square error (RMSE) as the metric, shown in the equation below: RMSE=sqrt(1/2*1/m* [∑_(i=1)^m▒〖(〖DTC〗_pred^i-〖DTC〗_true^i)〗^2 + 〖(〖DTS〗_pred^i-〖DTS〗_true^i)〗^2 ] In the benchmark model, (Yu et al., 2020), we used a Random Forest regressor and conducted minimal preprocessing to the training data set; an RMSE score of 17.93 was achieved on the test data set. The top five models from the contest, on average, beat the performance of our benchmark model by 27% in the RMSE score. In the paper, we will review these five solutions, including preprocess techniques and different machine-learning models, including neural network, long short-term memory (LSTM), and ensemble trees. We found that data cleaning and clustering were critical for improving the performance in all models.

Download Full-text

COVID-19 Pandemic Prediction for Hungary; a Hybrid Machine Learning Approach

10.31219/osf.io/rtbym ◽

2020 ◽

Author(s):

Amir Mosavi

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Future Research ◽

Complex Nature ◽

Learning Approach ◽

Epidemiological Models ◽

Inference System ◽

Proper Actions ◽

Machine Learning Approach ◽

Hybrid Machine

Several epidemiological models are being used around the world to project the number of infected individuals and the mortality rates of the COVID-19 outbreak. Advancing accurate prediction models is of utmost importance to take proper actions. Due to a high level of uncertainty or even lack of essential data, the standard epidemiological models have been challenged regarding the delivery of higher accuracy for long-term prediction. As an alternative to the susceptible-infected-resistant (SIR)-based models, this study proposes a hybrid machine learning approach to predict the COVID-19 and we exemplify its potential using data from Hungary. The hybrid machine learning methods of adaptive network-based fuzzy inference system (ANFIS) and multi-layered perceptron-imperialist competitive algorithm (MLP-ICA) are used to predict time series of infected individuals and mortality rate. The models predict that by late May, the outbreak and the total morality will drop substantially. The validation is performed for nine days with promising results, which confirms the model accuracy. It is expected that the model maintains its accuracy as long as no significant interruption occurs. Based on the results reported here, and due to the complex nature of the COVID-19 outbreak and variation in its behavior from nation-to-nation, this study suggests machine learning as an effective tool to model the outbreak. This paper provides an initial benchmarking to demonstrate the potential of machine learning for future research.

Download Full-text

Classification among Microaneurysms, Exudates, and Lesion free Retinal Regions in the Eye Images using Transfer Learned CNNs

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b4539.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 5508-5512

Keyword(s):

Neural Network ◽

Machine Learning ◽

Diabetic Retinopathy ◽

Glucose Level ◽

Vision Loss ◽

Early Stage ◽

Training Data ◽

Fundus Images ◽

Diabetic Mellitus ◽

Start Process

When pancreas fails to secrete sufficient insulin in the human body, the glucose level in blood either becomes too high or too low. This fluctuation in glucose level affects different body organs such as kidney, brain, and eye. When the complications start appearing in the eyes due to Diabetic Mellitus (DM), it is called Diabetic Retinopathy (DR). DR can be categorized in several classes based on the severity, it can be Microaneurysms (ME), Haemorrhages (HE), Hard and Soft Exudates (EX and SE). DR is a slow start process that starts with very mild symptoms, becomes moderate with the time and results in complete vision loss, if not detected on time. Early-stage detection may greatly bolster in vision loss. However, it is impassable to detect the symptoms of DR with naked eyes. Ophthalmologist harbor to the several approaches and algorithm which makes use of different Machine Learning (ML) methods and classifiers to overcome this disease. The burgeoning insistence of Convolutional Neural Network (CNN) and their advancement in extracting features from different fundus images captivate several researchers to strive on it. Transfer Learning (TL) techniques help to use pre-trained CNN on a dataset that has finite training data, especially that in under developing countries. In this work, we propose several CNN architecture along with distinct classifiers which segregate the different lesions (ME and EX) in DR images with very eye-catching accuracies.

Download Full-text