Impact of Interventional Policies Including Vaccine on COVID-19 Propagation and Socio-Economic Factors: Predictive Model Enabling Simulations Using Machine Learning and Big Data (Preprint)
BACKGROUND A novel coronavirus disease has emerged (later named COVID-19) and caused the world to enter a new reality, with many direct and indirect factors influencing it. Some are human-controllable (e.g. interventional policies, mobility and the vaccine); some are not (e.g. the weather). We have sought to test how a change in these human-controllable factors might influence two measures: the number of daily cases against economic impact. If applied at the right level and with up-to-date data to measure, policymakers would be able to make targeted interventions and measure their cost. OBJECTIVE The study aimed to provide a predictive analytics framework to model, predict and simulate COVID-19 propagation and the socio-economic impact of interventions intended to reduce the spread of the disease such as policy and/or vaccine. It allows policymakers, government representatives and business leaders to make better-informed decisions about the potential effect of various interventions with forward-looking views via scenario planning. METHODS We leveraged a recently launched opensource COVID-19 big data platform and used published research to find potentially relevant variables (features), completing feature selection and engineering via in-depth data quality checks and analytics. An advanced machine learning pipeline has been developed. It contains the ensemble models, auto/semi-auto hyperparameter tuning and customized interpretability functions. And It is self-evolving as always learned from the most recent data. The output predicts daily cases and economic factors (e.g. small business revenue) to allow simulation of interventions including a vaccine (proxied by an influenza vaccination efficacy model). This framework is built using an open-source technology stack and we make the source code being publicly available as well. RESULTS This model is self-evolving and deployed on modern machine learning architecture. It has high accuracy for trend prediction (back-tested with r-squared). We bring simulation and interpretability in the framework. It models not just daily-cases, but also socio-economic demographics. CONCLUSIONS Human behaviour and extreme natural disasters are hard to measure with data points. No model can provide an answer that is correct 100% of the time; however, with high-quality model and big data, a forward-looking view can be inferred or at least noted. This predictive model can help the policymakers to test scenarios, plan proactive actions, optimize logistics, measure the cost and create an open dialogue with the general public.