scholarly journals Forecasting the COVID-19 Epidemic By Integrating Symptom Search Behavior Into Predictive Models: Infoveillance Study (Preprint)

2021 ◽  
Author(s):  
Alessandro Rabiolo ◽  
Eugenio Alladio ◽  
Esteban Morales ◽  
Andrew Ian McNaught ◽  
Francesco Bandello ◽  
...  

BACKGROUND Previous studies have suggested associations between trends of web searches and COVID-19 traditional metrics. It remains unclear whether models incorporating trends of digital searches lead to better predictions. OBJECTIVE The aim of this study is to investigate the relationship between Google Trends searches of symptoms associated with COVID-19 and confirmed COVID-19 cases and deaths. We aim to develop predictive models to forecast the COVID-19 epidemic based on a combination of Google Trends searches of symptoms and conventional COVID-19 metrics. METHODS An open-access web application was developed to evaluate Google Trends and traditional COVID-19 metrics via an interactive framework based on principal component analysis (PCA) and time series modeling. The application facilitates the analysis of symptom search behavior associated with COVID-19 disease in 188 countries. In this study, we selected the data of nine countries as case studies to represent all continents. PCA was used to perform data dimensionality reduction, and three different time series models (error, trend, seasonality; autoregressive integrated moving average; and feed-forward neural network autoregression) were used to predict COVID-19 metrics in the upcoming 14 days. The models were compared in terms of prediction ability using the root mean square error (RMSE) of the first principal component (PC1). The predictive abilities of models generated with both Google Trends data and conventional COVID-19 metrics were compared with those fitted with conventional COVID-19 metrics only. RESULTS The degree of correlation and the best time lag varied as a function of the selected country and topic searched; in general, the optimal time lag was within 15 days. Overall, predictions of PC1 based on both search terms and COVID-19 traditional metrics performed better than those not including Google searches (median 1.56, IQR 0.90-2.49 versus median 1.87, IQR 1.09-2.95, respectively), but the improvement in prediction varied as a function of the selected country and time frame. The best model varied as a function of country, time range, and period of time selected. Models based on a 7-day moving average led to considerably smaller RMSE values as opposed to those calculated with raw data (median 0.90, IQR 0.50-1.53 versus median 2.27, IQR 1.62-3.74, respectively). CONCLUSIONS The inclusion of digital online searches in statistical models may improve the nowcasting and forecasting of the COVID-19 epidemic and could be used as one of the surveillance systems of COVID-19 disease. We provide a free web application operating with nearly real-time data that anyone can use to make predictions of outbreaks, improve estimates of the dynamics of ongoing epidemics, and predict future or rebound waves.

2021 ◽  
Author(s):  
Alessandro Rabiolo ◽  
Eugenio Alladio ◽  
Esteban Morales ◽  
Andrew I McNaught ◽  
Francesco Bandello ◽  
...  

ABSTRACTBackgroundPrevious studies have suggested associations between trends of web searches and COVID-19 traditional metrics. It remains unclear whether models incorporating trends of digital searches lead to better predictions.MethodsAn open-access web application was developed to evaluate Google Trends and traditional COVID-19 metrics via an interactive framework based on principal components analysis (PCA) and time series modelling. The app facilitates the analysis of symptom search behavior associated with COVID-19 disease in 188 countries. In this study, we selected data of eight countries as case studies to represent all continents. PCA was used to perform data dimensionality reduction, and three different time series models (Error Trend Seasonality, Autoregressive integrated moving average, and feed-forward neural network autoregression) were used to predict COVID-19 metrics in the upcoming 14 days. The models were compared in terms of prediction ability using the root-mean-square error (RMSE) of the first principal component (PC1). Predictive ability of models generated with both Google Trends data and conventional COVID-19 metrics were compared with those fitted with conventional COVID-19 metrics only.FindingsThe degree of correlation and the best time-lag varied as a function of the selected country and topic searched; in general, the optimal time-lag was within 15 days. Overall, predictions of PC1 based on both searched termed and COVID-19 traditional metrics performed better than those not including Google searches (median [IQR]: 1.43 [0.74-2.36] vs. 1.78 [0.95-2.88], respectively), but the improvement in prediction varied as a function of the selected country and timeframe. The best model varied as a function of country, time range, and period of time selected. Models based on a 7-day moving average led to considerably smaller RMSE values as opposed to those calculated with raw data (median [IQR]: 0.74 [0.47-1.22] vs. 2.15 [1.55-3.89], respectively).InterpretationThe inclusion of digital online searches in statistical models may improve the prediction of the COVID-19 epidemic.FundingEOSCsecretariat.eu has received funding from the European Union’s Horizon Programme call H2020-INFRAEOSC-05-2018-2019, grant Agreement number 831644.


2021 ◽  
Vol 7 ◽  
Author(s):  
Martin Palma ◽  
Alessandro Zandonai ◽  
Luca Cattani ◽  
Johannes Klotz ◽  
Giulio Genova ◽  
...  

Easily accessible data is an essential requirement for scientific data analysis. The Data Browser Matsch | Mazia was designed to provide a fast and comprehensible solution to access, visualize and download the microclimatic measurements of the IT 25 LT(S)ER Match | Mazia research site in South Tyrol, Northern Italy, with the overall aim to provide straightforward data accessibility and enhance dissemination. Data Browser Matsch | Mazia is a user-friendly web-based application to visualize and download micrometeorological and biophysical time series of the Long-Term Socio-Ecological Research site Matsch | Mazia in South Tyrol, Italy. It is designed both for the general public and researchers. The Data Browser Matsch | Mazia drop-down menus allow the user to query the InfluxDB database in the backend by selecting the measurements, time range, land use and elevation. Interactive Grafana dashboards show dynamic graphs of the time series.


Author(s):  
Akio Nakata ◽  
Miki Kaneko ◽  
Chinami Taki ◽  
Naoko Evans ◽  
Taiki Shigematsu ◽  
...  

We propose higher-order detrending moving-average cross-correlation analysis (DMCA) to assess the long-range cross-correlations in cardiorespiratory and cardiovascular interactions. Although the original (zeroth-order) DMCA employs a simple moving-average detrending filter to remove non-stationary trends embedded in the observed time series, our approach incorporates a Savitzky–Golay filter as a higher-order detrending method. Because the non-stationary trends can adversely affect the long-range correlation assessment, the higher-order detrending serves to improve accuracy. To achieve a more reliable characterization of the long-range cross-correlations, we demonstrate the importance of the following steps: correcting the time scale, confirming the consistency of different order DMCAs, and estimating the time lag between time series. We applied this methodological framework to cardiorespiratory and cardiovascular time series analysis. In the cardiorespiratory interaction, respiratory and heart rate variability (HRV) showed long-range auto-correlations; however, no factor was shared between them. In the cardiovascular interaction, beat-to-beat systolic blood pressure and HRV showed long-range auto-correlations and shared a common long-range, cross-correlated factor. This article is part of the theme issue ‘Advanced computation in cardiovascular physiology: new challenges and opportunities’.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yanling Zheng ◽  
Liping Zhang ◽  
Chunxia Wang ◽  
Kai Wang ◽  
Gang Guo ◽  
...  

AbstractBrucellosis is one of the major public health problems in China, and human brucellosis represents a serious public health concern in Xinjiang and requires a prediction analysis to help making early planning and putting forward science preventive and control countermeasures. According to the characteristics of the time series of monthly reported cases of human brucellosis in Xinjiang from January 2008 to June 2020, we used seasonal autoregressive integrated moving average (SARIMA) method and nonlinear autoregressive regression neural network (NARNN) method, which are widely prevalent and have high prediction accuracy, to construct prediction models and make prediction analysis. Finally, we established the SARIMA((1,4,5,7),0,0)(0,1,2)12 model and the NARNN model with a time lag of 5 and a hidden layer neuron of 10. Both models have high fitting performance. After comparing the accuracies of two established models, we found that the SARIMA((1,4,5,7),0,0)(0,1,2)12 model was better than the NARNN model. We used the SARIMA((1,4,5,7),0,0)(0,1,2)12 model to predict the number of monthly reported cases of human brucellosis in Xinjiang from July 2020 to December 2021, and the results showed that the fluctuation of the time series from July 2020 to December 2021 was similar to that of the last year and a half while maintaining the current prevention and control ability. The methodology applied here and its prediction values of this study could be useful to give a scientific reference for prevention and control human brucellosis.


2021 ◽  
Vol 18 (2) ◽  
pp. 4-11
Author(s):  
V. V. Nikitin ◽  
D. V. Bobin

Purpose of the research. Let’s assume that the dynamics of the state of some object is being investigated. Its state is described by a system of specified indicators. Among them, some may be a linear combination of other indicators. The aim of any forecasting procedure is to solve two problems: first, to estimate the expected forecast value, and second, to estimate the confidence interval for possible other forecast values. The prediction procedure is multidimensional. Since the indicators describe the same object, in addition to explicit dependencies, there may be hidden dependencies among them. The principal component analysis effectively takes into account the variation of data in the system of the studied indicators. Therefore, it is desirable to use this method in the forecasting procedure. The results of forecasting would be more adequate if it were possible to implement different forecasting strategies. But this will require a modification of the traditional principal component analysis. Therefore, this is the main aim of this study. A related aim is to investigate the possibility of solving the second forecasting problem, which is more complex than the first one. Materials and research methods. When estimating the confidence interval, it is necessary to specify the procedure for estimating the expected forecast value. At the same time, it would be useful to use the methods of multidimensional time series. Usually, different time series models use the concept of time lag. Their number and weight significance in the model may be different. In this study, we propose a time series model based on the exponential smoothing method. The prediction procedure is multidimensional. It will rely on the rule of agreed upon data change. Therefore, the algorithm for predictive evaluation of a particular indicator is presented in a form that will be convenient for building and practical use of this rule in the future. The principal component analysis should take into account the weights of the indicator values. This is necessary for the implementation of various strategies for estimating the boundaries of the forecast values interval. The proposed standardization of weighted data promotes to the implementation of the main theorem of factor analysis. This ensures the construction of an orthonormal basis in the factor area. At the same time, it was not necessary to build an iterative algorithm, which is typical for such studies. Results. For the test data set, comparative calculations were performed using the traditional and weighted principal component analysis. It shows that the main characteristics of the component analysis are preserved. One of the indicators under consideration clearly depends on the others. Therefore, both methods show that the number of factors is less than the number of indicators. All indicators have a good relationship with the factors. In the traditional method, the dependent indicator is included in the first main component. In the modified method, this indicator is better related to the second component. Conclusion. It was shown that the elements of the factor matrix corresponding to the forecast time can be expressed as weighted averages of the previous factor values. This will allow us to estimate the limits of the confidence interval for each individual indicator, as well as for the complex indicator of the entire system. This takes into account both the consistency of data changes and the forecasting strategy.


2020 ◽  
Vol 12 (1) ◽  
Author(s):  
Han Lin Shang

AbstractThe Hurst exponent is the simplest numerical summary of self-similar long-range dependent stochastic processes. We consider the estimation of Hurst exponent in long-range dependent curve time series. Our estimation method begins by constructing an estimate of the long-run covariance function, which we use, via dynamic functional principal component analysis, in estimating the orthonormal functions spanning the dominant sub-space of functional time series. Within the context of functional autoregressive fractionally integrated moving average (ARFIMA) models, we compare finite-sample bias, variance and mean square error among some time- and frequency-domain Hurst exponent estimators and make our recommendations.


1982 ◽  
Vol 14 (3) ◽  
pp. 156-166 ◽  
Author(s):  
Chin-Sheng Alan Kang ◽  
David D. Bedworth ◽  
Dwayne A. Rollier

2000 ◽  
Vol 14 (1) ◽  
pp. 1-10 ◽  
Author(s):  
Joni Kettunen ◽  
Niklas Ravaja ◽  
Liisa Keltikangas-Järvinen

Abstract We examined the use of smoothing to enhance the detection of response coupling from the activity of different response systems. Three different types of moving average smoothers were applied to both simulated interbeat interval (IBI) and electrodermal activity (EDA) time series and to empirical IBI, EDA, and facial electromyography time series. The results indicated that progressive smoothing increased the efficiency of the detection of response coupling but did not increase the probability of Type I error. The power of the smoothing methods depended on the response characteristics. The benefits and use of the smoothing methods to extract information from psychophysiological time series are discussed.


2020 ◽  
Vol 5 (1) ◽  
pp. 374
Author(s):  
Pauline Jin Wee Mah ◽  
Nur Nadhirah Nanyan

The main purpose of this study is to compare the performances of univariate and bivariate models on four time series variables of the crude palm oil industry in Peninsular Malaysia. The monthly data for the four variables, which are the crude palm oil production, price, import and export, were obtained from Malaysian Palm Oil Board (MPOB) and Malaysian Palm Oil Council (MPOC). In the first part of this study, univariate time series models, namely, the autoregressive integrated moving average (ARIMA), fractionally integrated autoregressive moving average (ARFIMA) and autoregressive autoregressive (ARAR) algorithm were used for modelling and forecasting purposes. Subsequently, the dependence between any two of the four variables were checked using the residuals’ sample cross correlation functions before modelling the bivariate time series. In order to model the bivariate time series and make prediction, the transfer function models were used. The forecast accuracy criteria used to evaluate the performances of the models were the mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE). The results of the univariate time series showed that the best model for predicting the production was ARIMA  while the ARAR algorithm were the best forecast models for predicting both the import and export of crude palm oil. However, ARIMA  appeared to be the best forecast model for price based on the MAE and MAPE values while ARFIMA  emerged the best model based on the RMSE value.  When considering bivariate time series models, the production was dependent on import while the export was dependent on either price or import. The results showed that the bivariate models had better performance compared to the univariate models for production and export of crude palm oil based on the forecast accuracy criteria used.


Sign in / Sign up

Export Citation Format

Share Document