Accurate Severe vs Non-severe COVID-19 Clinical Type Classification: a Multimodality Machine Learning Study (Preprint)
BACKGROUND Effectively and efficiently diagnosing COVID-19 patients with accurate clinical type is essential to achieve optimal outcomes for the patients as well as reducing the risk of overloading the healthcare system. Currently, severe and non-severe COVID-19 types are differentiated by only a few features, which do not comprehensively characterize the complicated pathological, physiological, and immunological responses to SARS-CoV-2 invasion in different types. In addition, these type-defining features may not be readily testable at time of diagnosis. OBJECTIVE This study aimed to accurately differentiate severe and non-severe COVID-19 clinical types based on multiple medical features and provide reliable predictions for clinical decision support. METHODS In this study, we recruited 214 confirmed COVID-19 patients in non-severe and 148 in severe type. The patients’ clinical (including 26 features), and laboratory testing results (26 features) upon admission were acquired as two input modalities. Exploratory analyses demonstrated that these features differed substantially between two clinical types. Machine learning random forest (RF) models based on all features in each modality as well as top 5 features in each modality combined were developed and validated to differentiate COVID-19 clinical types. RESULTS Using clinical and laboratory results as input independently, RF models achieved 90% and 95% predictive accuracy, respectively. Input features’ importance scores were further evaluated and top five features from each modality were identified (age, hypertension, cardiovascular disease, gender, diabetes; D-Dimer, hsTNI, absolute neutrophil count, IL-6, and LDH, in descending order). Using these top 10 multimodal features as the only input instead of all 52 features combined, RF model was able to achieve 99% predictive accuracy. CONCLUSIONS These findings shed light on how the human body reacts to SARS-CoV-2 invasion as a unity and provide insights on effectively evaluating COVID-19 patient’s severity based on more common medical features when gold-standard features were not available. We suggest that clinical information can be used as an initial screening tool for self-evaluation and triaging, while laboratory testing results are applied when accuracy is the priority.