Development and Validation of a Machine Learning Approach for Automated Severity Assessment of COVID-19 Based on Clinical and Imaging Data: Retrospective Study

Juan Carlos Quiroz; You-Zhen Feng; Zhong-Yuan Cheng; Dana Rezazadegan; Ping-Kang Chen; Qi-Ting Lin; Long Qian; Xiao-Fang Liu; Shlomo Berkovsky; Enrico Coiera; Lei Song; Xiaoming Qiu; Sidong Liu; Xiang-Ran Cai

doi:10.2196/24572

Development and Validation of a Machine Learning Approach for Automated Severity Assessment of COVID-19 Based on Clinical and Imaging Data: Retrospective Study

JMIR Medical Informatics ◽

10.2196/24572 ◽

2021 ◽

Vol 9 (2) ◽

pp. e24572

Author(s):

Juan Carlos Quiroz ◽

You-Zhen Feng ◽

Zhong-Yuan Cheng ◽

Dana Rezazadegan ◽

Ping-Kang Chen ◽

...

Keyword(s):

Machine Learning ◽

Predictive Power ◽

Care Delivery ◽

Learning Approach ◽

Imaging Features ◽

Severity Assessment ◽

Imaging Data ◽

Learning Models ◽

Machine Learning Approach ◽

Machine Learning Models

Background COVID-19 has overwhelmed health systems worldwide. It is important to identify severe cases as early as possible, such that resources can be mobilized and treatment can be escalated. Objective This study aims to develop a machine learning approach for automated severity assessment of COVID-19 based on clinical and imaging data. Methods Clinical data—including demographics, signs, symptoms, comorbidities, and blood test results—and chest computed tomography scans of 346 patients from 2 hospitals in the Hubei Province, China, were used to develop machine learning models for automated severity assessment in diagnosed COVID-19 cases. We compared the predictive power of the clinical and imaging data from multiple machine learning models and further explored the use of four oversampling methods to address the imbalanced classification issue. Features with the highest predictive power were identified using the Shapley Additive Explanations framework. Results Imaging features had the strongest impact on the model output, while a combination of clinical and imaging features yielded the best performance overall. The identified predictive features were consistent with those reported previously. Although oversampling yielded mixed results, it achieved the best model performance in our study. Logistic regression models differentiating between mild and severe cases achieved the best performance for clinical features (area under the curve [AUC] 0.848; sensitivity 0.455; specificity 0.906), imaging features (AUC 0.926; sensitivity 0.818; specificity 0.901), and a combination of clinical and imaging features (AUC 0.950; sensitivity 0.764; specificity 0.919). The synthetic minority oversampling method further improved the performance of the model using combined features (AUC 0.960; sensitivity 0.845; specificity 0.929). Conclusions Clinical and imaging features can be used for automated severity assessment of COVID-19 and can potentially help triage patients with COVID-19 and prioritize care delivery to those at a higher risk of severe disease.

Download Full-text

Development and Validation of a Machine Learning Approach for Automated Severity Assessment of COVID-19 Based on Clinical and Imaging Data: Retrospective Study (Preprint)

10.2196/preprints.24572 ◽

2020 ◽

Author(s):

Juan Carlos Quiroz ◽

You-Zhen Feng ◽

Zhong-Yuan Cheng ◽

Dana Rezazadegan ◽

Ping-Kang Chen ◽

...

Keyword(s):

Machine Learning ◽

Predictive Power ◽

Care Delivery ◽

Learning Approach ◽

Imaging Features ◽

Severity Assessment ◽

Imaging Data ◽

Learning Models ◽

Machine Learning Approach ◽

Machine Learning Models

BACKGROUND COVID-19 has overwhelmed health systems worldwide. It is important to identify severe cases as early as possible, such that resources can be mobilized and treatment can be escalated. OBJECTIVE This study aims to develop a machine learning approach for automated severity assessment of COVID-19 based on clinical and imaging data. METHODS Clinical data—including demographics, signs, symptoms, comorbidities, and blood test results—and chest computed tomography scans of 346 patients from 2 hospitals in the Hubei Province, China, were used to develop machine learning models for automated severity assessment in diagnosed COVID-19 cases. We compared the predictive power of the clinical and imaging data from multiple machine learning models and further explored the use of four oversampling methods to address the imbalanced classification issue. Features with the highest predictive power were identified using the Shapley Additive Explanations framework. RESULTS Imaging features had the strongest impact on the model output, while a combination of clinical and imaging features yielded the best performance overall. The identified predictive features were consistent with those reported previously. Although oversampling yielded mixed results, it achieved the best model performance in our study. Logistic regression models differentiating between mild and severe cases achieved the best performance for clinical features (area under the curve [AUC] 0.848; sensitivity 0.455; specificity 0.906), imaging features (AUC 0.926; sensitivity 0.818; specificity 0.901), and a combination of clinical and imaging features (AUC 0.950; sensitivity 0.764; specificity 0.919). The synthetic minority oversampling method further improved the performance of the model using combined features (AUC 0.960; sensitivity 0.845; specificity 0.929). CONCLUSIONS Clinical and imaging features can be used for automated severity assessment of COVID-19 and can potentially help triage patients with COVID-19 and prioritize care delivery to those at a higher risk of severe disease.

Download Full-text

Severity Assessment of COVID-19 based on Clinical and Imaging Data

10.1101/2020.08.12.20173872 ◽

2020 ◽

Author(s):

Juan Quiroz ◽

Youzhen Feng ◽

Zhongyuan Cheng ◽

Dana Rezazadegan ◽

Pingkang Chen ◽

...

Keyword(s):

Machine Learning ◽

Predictive Power ◽

Imaging Features ◽

Severity Assessment ◽

Imaging Data ◽

Learning Models ◽

Logistic Regression Models ◽

Machine Learning Approach ◽

Combined Features ◽

Machine Learning Models

Objectives This study aims to develop a machine learning approach for automated severity assessment of COVID-19 patients based on clinical and imaging data. Materials and Methods Clinical data, including demographics, signs, symptoms, comorbidities and blood test results and chest CT scans of 346 patients from two hospitals in the Hubei province, China, were used to develop machine learning models for automated severity assessment of diagnosed COVID-19 cases. We compared the predictive power of clinical and imaging data by testing multiple machine learning models, and further explored the use of four oversampling methods to address the imbalance distribution issue. Features with the highest predictive power were identified using the SHAP framework. Results Targeting differentiation between mild and severe cases, logistic regression models achieved the best performance on clinical features (AUC:0.848, sensitivity:0.455, specificity:0.906), imaging features (AUC:0.926, sensitivity:0.818, specificity:0.901) and the combined features (AUC:0.950, sensitivity:0.764, specificity:0.919). The SMOTE oversampling method further improved the performance of the combined features to AUC of 0.960 (sensitivity:0.845, specificity:0.929). Discussion Imaging features had the strongest impact on the model output, while a combination of clinical and imaging features yielded the best performance overall. The identified predictive features were consistent with findings from previous studies. Oversampling yielded mixed results, although it achieved the best performance in our study. Conclusions This study indicates that clinical and imaging features can be used for automated severity assessment of COVID-19 patients and have the potential to assist with triaging COVID-19 patients and prioritizing care for patients at higher risk of severe cases. [Manuscript last updated on 31 July, 2020]

Download Full-text

Telugu News Data Classification Using Machine Learning Approach

10.4018/978-1-7998-7685-4.ch014 ◽

2022 ◽

pp. 181-194

Author(s):

Bala Krishna Priya G. ◽

Jabeen Sultana ◽

Usha Rani M.

Keyword(s):

Machine Learning ◽

Social Media ◽

Research Work ◽

Learning Approach ◽

Fake News ◽

Learning Models ◽

Machine Learning Classifiers ◽

Proposed Model ◽

Machine Learning Approach ◽

Machine Learning Models

Mining Telugu news data and categorizing based on public sentiments is quite important since a lot of fake news emerged with rise of social media. Identifying whether news text is positive, negative, or neutral and later classifying the data in which areas they fall like business, editorial, entertainment, nation, and sports is included throughout this research work. This research work proposes an efficient model by adopting machine learning classifiers to perform classification on Telugu news data. The results obtained by various machine-learning models are compared, and an efficient model is found, and it is observed that the proposed model outperformed with reference to accuracy, precision, recall, and F1-score.

Download Full-text

A Radiogenomics Ensemble to Predict EGFR and KRAS Mutations in NSCLC

Tomography ◽

10.3390/tomography7020014 ◽

2021 ◽

Vol 7 (2) ◽

pp. 154-168

Author(s):

Silvia Moreno ◽

Mario Bonfante ◽

Eduardo Zurek ◽

Dmitry Cherezov ◽

Dmitry Goldgof ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Kras Mutation ◽

Learning Approach ◽

Learning Models ◽

Kras Mutations ◽

Machine Learning Approach ◽

Class Average ◽

Public Datasets ◽

Machine Learning Models

Lung cancer causes more deaths globally than any other type of cancer. To determine the best treatment, detecting EGFR and KRAS mutations is of interest. However, non-invasive ways to obtain this information are not available. Furthermore, many times there is a lack of big enough relevant public datasets, so the performance of single classifiers is not outstanding. In this paper, an ensemble approach is applied to increase the performance of EGFR and KRAS mutation prediction using a small dataset. A new voting scheme, Selective Class Average Voting (SCAV), is proposed and its performance is assessed both for machine learning models and CNNs. For the EGFR mutation, in the machine learning approach, there was an increase in the sensitivity from 0.66 to 0.75, and an increase in AUC from 0.68 to 0.70. With the deep learning approach, an AUC of 0.846 was obtained, and with SCAV, the accuracy of the model was increased from 0.80 to 0.857. For the KRAS mutation, both in the machine learning models (0.65 to 0.71 AUC) and the deep learning models (0.739 to 0.778 AUC), a significant increase in performance was found. The results obtained in this work show how to effectively learn from small image datasets to predict EGFR and KRAS mutations, and that using ensembles with SCAV increases the performance of machine learning classifiers and CNNs. The results provide confidence that as large datasets become available, tools to augment clinical capabilities can be fielded.

Download Full-text

Identification of Key Influencers for Secondary Distribution of HIV Self-Testing among Chinese MSM: A Machine Learning Approach

10.1101/2021.04.19.21255584 ◽

2021 ◽

Author(s):

Fengshi JING ◽

Yang Ye ◽

Yi Zhou ◽

Yuxin Ni ◽

Xumeng Yan ◽

...

Keyword(s):

Machine Learning ◽

Human Identification ◽

Support Vector ◽

Learning Approach ◽

Learning Models ◽

Self Testing ◽

Machine Learning Approach ◽

Secondary Distribution ◽

First Time ◽

Machine Learning Models

Abstract Background. HIV self-testing (HIVST) has been rapidly scaled up and additional strategies further expand testing uptake. Secondary distribution has people (indexes) apply for multiple kits and pass these kits to people (alters) in their social networks. However, identifying key influencers is difficult. This study aimed to develop an innovative ensemble machine learning approach to identify key influencers among Chinese men who have sex with men (MSM) for HIVST secondary distribution. Method. We defined three types of key influencers: 1) key distributors who can distribute more kits; 2) key promoters who can contribute to finding first-time testing alters; 3) key detectors who can help to find positive alters. Four machine learning models (logistic regression, support vector machine, decision tree, random forest) were trained to identify key influencers. An ensemble learning algorithm was adopted to combine these four models. Simulation experiments were run to validate our approach. Results. 309 indexes distributed kits to 269 alters. Our approach outperformed human identification (self-reported scales cut-off), exceeding by an average accuracy of 11.0%, could distribute 18.2% (95%CI: 9.9%-26.5%) more kits, find 13.6% (95%CI: 1.9%-25.3%) more first-time testing alters and 12.0% (95%CI: -14.7%-38.7%) more positive-testing alters. Our approach could also increase simulated intervention efficiency by 17.7% (95%CI: -3.5%-38.8%) than human identification. Conclusion. We built machine learning models to identify key influencers among Chinese MSM who were more likely to engage in HIVST secondary distribution.

Download Full-text

Direct Comparison of the Prediction of the Unbound Brain-to-Plasma Partitioning Utilizing Machine Learning Approach and Mechanistic Neuropharmacokinetic Model

The AAPS Journal ◽

10.1208/s12248-021-00604-x ◽

2021 ◽

Vol 23 (4) ◽

Author(s):

Yohei Kosugi ◽

Kunihiko Mizuno ◽

Cipriano Santos ◽

Sho Sato ◽

Natalie Hosea ◽

...

Keyword(s):

Machine Learning ◽

Multiple Drug Resistance ◽

Predictive Performance ◽

Training Dataset ◽

Multiple Drug ◽

Learning Approach ◽

Cancer Resistance ◽

Learning Models ◽

Machine Learning Approach ◽

Machine Learning Models

AbstractThe mechanistic neuropharmacokinetic (neuroPK) model was established to predict unbound brain-to-plasma partitioning (Kp,uu,brain) by considering in vitro efflux activities of multiple drug resistance 1 (MDR1) and breast cancer resistance protein (BCRP). Herein, we directly compare this model to a computational machine learning approach utilizing physicochemical descriptors and efflux ratios of MDR1 and BCRP-expressing cells for predicting Kp,uu,brain in rats. Two different types of machine learning techniques, Gaussian processes (GP) and random forest regression (RF), were assessed by the time and cluster-split validation methods using 640 internal compounds. The predictivity of machine learning models based on only molecular descriptors in the time-split dataset performed worse than the cluster-split dataset, whereas the models incorporating MDR1 and BCRP efflux ratios showed similar predictivity between time and cluster-split datasets. The GP incorporating MDR1 and BCRP in the time-split dataset achieved the highest correlation (R2 = 0.602). These results suggested that incorporation of MDR1 and BCRP in machine learning is beneficial for robust and accurate prediction. Kp,uu,brain prediction utilizing the neuroPK model was significantly worse compared to machine learning approaches for the same dataset. We also investigated the predictivity of Kp,uu,brain using an external independent test set of 34 marketed drugs. Compared to machine learning models, the neuroPK model showed better predictive performance with R2 of 0.577. This work demonstrates that the machine learning model for Kp,uu,brain achieves maximum predictive performance within the chemical applicability domain, whereas the neuroPK model is applicable more widely beyond the chemical space covered in the training dataset.

Download Full-text

What Predicts Corruption?

10.31235/osf.io/fq2xb ◽

2020 ◽

Author(s):

Emanuele Colonnelli ◽

Jorge Gallego ◽

Mounu Prem

Keyword(s):

Machine Learning ◽

Human Capital ◽

Cost Effectiveness ◽

Public Sector ◽

Financial Development ◽

Predictive Power ◽

Public Spending ◽

Learning Models ◽

Micro Data ◽

Machine Learning Models

The ability to predict corruption is crucial to policy. Using rich micro-data from Brazil, we show that multiple machine learning models display high levels of performance in predicting municipality-level corruption in public spending. We then quantify which individual municipality features and groups of similar characteristics have the highest predictive power. We find that measures of private sector activity, financial development, and human capital are the strongest predictors of corruption, while public sector and political features play a secondary role. Our findings have implications for the design and cost-effectiveness of various anti-corruption policies.

Download Full-text

ML-CB: Machine Learning Canvas Block

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2021-0056 ◽

2021 ◽

Vol 2021 (3) ◽

pp. 453-473

Author(s):

Nathan Reitinger ◽

Michelle L. Mazurek

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Semantic Representation ◽

Source Code ◽

Online Privacy ◽

Learning Approach ◽

Learning Models ◽

One Step ◽

The Web ◽

Machine Learning Models

Abstract With the aim of increasing online privacy, we present a novel, machine-learning based approach to blocking one of the three main ways website visitors are tracked online—canvas fingerprinting. Because the act of canvas fingerprinting uses, at its core, a JavaScript program, and because many of these programs are reused across the web, we are able to fit several machine learning models around a semantic representation of a potentially offending program, achieving accurate and robust classifiers. Our supervised learning approach is trained on a dataset we created by scraping roughly half a million websites using a custom Google Chrome extension storing information related to the canvas. Classification leverages our key insight that the images drawn by canvas fingerprinting programs have a facially distinct appearance, allowing us to manually classify files based on the images drawn; we take this approach one step further and train our classifiers not on the malleable images themselves, but on the more-difficult-to-change, underlying source code generating the images. As a result, ML-CB allows for more accurate tracker blocking.

Download Full-text

Acoustic emission corrosion feature extraction and severity prediction using hybrid wavelet packet transform and linear support vector classifier

PLoS ONE ◽

10.1371/journal.pone.0261040 ◽

2021 ◽

Vol 16 (12) ◽

pp. e0261040

Author(s):

Zazilah May ◽

M. K. Alam ◽

Nazrul Anuar Nayan ◽

Noor A’in A. Rahman ◽

Muhammad Shazwan Mahmud

Keyword(s):

Machine Learning ◽

Acoustic Emission ◽

Feature Extraction ◽

Wavelet Packet ◽

Support Vector ◽

Learning Approach ◽

Severity Assessment ◽

Corrosion Detection ◽

Support Vector Classifier ◽

Machine Learning Approach

Corrosion in carbon-steel pipelines leads to failure, which is a major cause of breakdown maintenance in the oil and gas industries. The acoustic emission (AE) signal is a reliable method for corrosion detection and classification in the modern Structural Health Monitoring (SHM) system. The efficiency of this system in detection and classification mainly depends on the suitable AE features. Therefore, many feature extraction and classification methods have been developed for corrosion detection and severity assessment. However, the extraction of appropriate AE features and classification of various levels of corrosion utilizing these extracted features are still challenging issues. To overcome these issues, this article proposes a hybrid machine learning approach that combines Wavelet Packet Transform (WPT) integrated with Fast Fourier Transform (FFT) for multiresolution feature extraction and Linear Support Vector Classifier (L-SVC) for predicting corrosion severity levels. A Laboratory-based Linear Polarization Resistance (LPR) test was performed on carbon-steel samples for AE data acquisition over a different time span. AE signals were collected at a high sampling rate with a sound well AE sensor using AEWin software. Simulation results show a linear relationship between the proposed approach-based extracted AE features and the corrosion process. For multi-class problems, three corrosion severity stages have been made based on the corrosion rate over time and AE activity. The ANOVA test results indicate the significance within and between the feature-groups where F-values (F-value>1) rejects the null hypothesis and P-values (P-value<0.05) are less than the significance level. The utilized L-SVC classifier achieves higher prediction accuracy of 99.0% than the accuracy of other benchmarked classifiers. Findings of our proposed machine learning approach confirm that it can be effectively utilized for corrosion detection and severity assessment in SHM applications.

Download Full-text

Investigation of gut microbiome association with inflammatory bowel disease and depression: a machine learning approach

F1000Research ◽

10.12688/f1000research.15091.2 ◽

2019 ◽

Vol 7 ◽

pp. 702

Author(s):

Pedro Morell Miranda ◽

Francesca Bertolini ◽

Haja N. Kadarmideen

Keyword(s):

Machine Learning ◽

Inflammatory Bowel Disease ◽

Random Forest ◽

Bowel Disease ◽

Gut Microbiome ◽

Predictive Power ◽

Learning Approach ◽

Important Species ◽

Machine Learning Approach ◽

Inflammatory Bowel

Background: Inflammatory bowel disease (IBD) is a group of chronic diseases related to inflammatory processes in the digestive tract generally associated with an immune response to an altered gut microbiome in genetically predisposed subjects. For years, both researchers and clinicians have been reporting increased rates of anxiety and depression disorders in IBD, and these disorders have also been linked to an altered microbiome. However, the underlying pathophysiological mechanisms of comorbidity are poorly understood at the gut microbiome level. Methods: Metagenomic and metatranscriptomic data were retrieved from the Inflammatory Bowel Disease Multi-Omics Database. Samples from 70 individuals that had answered to a self-reported depression and anxiety questionnaire were selected and classified by their IBD diagnosis and their questionnaire results, creating six different groups. The cross-validation random forest algorithm was used in 90% of the individuals (training set) to retain the most important species involved in discriminating the samples without losing predictive power. The validation set that represented the remaining 10% of the samples equally distributed across the six groups was used to train a random forest using only the species selected in order to evaluate their predictive power. Results: A total of 24 species were identified as the most informative in discriminating the 6 groups. Several of these species were frequently described in dysbiosis cases, such as species from the genus Bacteroides and Faecalibacterium prausnitzii. Despite the different compositions among the groups, no common patterns were found between samples classified as depressed. However, distinct taxonomic profiles within patients of IBD depending on their depression status were detected. Conclusions: The machine learning approach is a promising approach for investigating the role of microbiome in IBD and depression. Abundance and functional changes in these species suggest that depression should be considered as a factor in future research on IBD.

Download Full-text