Forecasting the COVID-19 Epidemic By Integrating Symptom Search Behavior Into Predictive Models: Infoveillance Study (Preprint)

Mapping Intimacies ◽

10.2196/preprints.28876 ◽

2021 ◽

Author(s):

Alessandro Rabiolo ◽

Eugenio Alladio ◽

Esteban Morales ◽

Andrew Ian McNaught ◽

Francesco Bandello ◽

...

Keyword(s):

Time Series ◽

Predictive Models ◽

Web Application ◽

Moving Average ◽

Time Lag ◽

Search Behavior ◽

Principal Component ◽

Google Trends ◽

Time Range ◽

Prediction Ability

BACKGROUND Previous studies have suggested associations between trends of web searches and COVID-19 traditional metrics. It remains unclear whether models incorporating trends of digital searches lead to better predictions. OBJECTIVE The aim of this study is to investigate the relationship between Google Trends searches of symptoms associated with COVID-19 and confirmed COVID-19 cases and deaths. We aim to develop predictive models to forecast the COVID-19 epidemic based on a combination of Google Trends searches of symptoms and conventional COVID-19 metrics. METHODS An open-access web application was developed to evaluate Google Trends and traditional COVID-19 metrics via an interactive framework based on principal component analysis (PCA) and time series modeling. The application facilitates the analysis of symptom search behavior associated with COVID-19 disease in 188 countries. In this study, we selected the data of nine countries as case studies to represent all continents. PCA was used to perform data dimensionality reduction, and three different time series models (error, trend, seasonality; autoregressive integrated moving average; and feed-forward neural network autoregression) were used to predict COVID-19 metrics in the upcoming 14 days. The models were compared in terms of prediction ability using the root mean square error (RMSE) of the first principal component (PC1). The predictive abilities of models generated with both Google Trends data and conventional COVID-19 metrics were compared with those fitted with conventional COVID-19 metrics only. RESULTS The degree of correlation and the best time lag varied as a function of the selected country and topic searched; in general, the optimal time lag was within 15 days. Overall, predictions of PC1 based on both search terms and COVID-19 traditional metrics performed better than those not including Google searches (median 1.56, IQR 0.90-2.49 versus median 1.87, IQR 1.09-2.95, respectively), but the improvement in prediction varied as a function of the selected country and time frame. The best model varied as a function of country, time range, and period of time selected. Models based on a 7-day moving average led to considerably smaller RMSE values as opposed to those calculated with raw data (median 0.90, IQR 0.50-1.53 versus median 2.27, IQR 1.62-3.74, respectively). CONCLUSIONS The inclusion of digital online searches in statistical models may improve the nowcasting and forecasting of the COVID-19 epidemic and could be used as one of the surveillance systems of COVID-19 disease. We provide a free web application operating with nearly real-time data that anyone can use to make predictions of outbreaks, improve estimates of the dynamics of ongoing epidemics, and predict future or rebound waves.

Download Full-text

Forecasting the COVID-19 epidemic integrating symptom search behavior: an infodemiology study

10.1101/2021.03.09.21253186 ◽

2021 ◽

Author(s):

Alessandro Rabiolo ◽

Eugenio Alladio ◽

Esteban Morales ◽

Andrew I McNaught ◽

Francesco Bandello ◽

...

Keyword(s):

Time Series ◽

Web Application ◽

Moving Average ◽

Time Lag ◽

Search Behavior ◽

Google Trends ◽

Optimal Time ◽

Time Range ◽

Prediction Ability ◽

Time Series Modelling

ABSTRACTBackgroundPrevious studies have suggested associations between trends of web searches and COVID-19 traditional metrics. It remains unclear whether models incorporating trends of digital searches lead to better predictions.MethodsAn open-access web application was developed to evaluate Google Trends and traditional COVID-19 metrics via an interactive framework based on principal components analysis (PCA) and time series modelling. The app facilitates the analysis of symptom search behavior associated with COVID-19 disease in 188 countries. In this study, we selected data of eight countries as case studies to represent all continents. PCA was used to perform data dimensionality reduction, and three different time series models (Error Trend Seasonality, Autoregressive integrated moving average, and feed-forward neural network autoregression) were used to predict COVID-19 metrics in the upcoming 14 days. The models were compared in terms of prediction ability using the root-mean-square error (RMSE) of the first principal component (PC1). Predictive ability of models generated with both Google Trends data and conventional COVID-19 metrics were compared with those fitted with conventional COVID-19 metrics only.FindingsThe degree of correlation and the best time-lag varied as a function of the selected country and topic searched; in general, the optimal time-lag was within 15 days. Overall, predictions of PC1 based on both searched termed and COVID-19 traditional metrics performed better than those not including Google searches (median [IQR]: 1.43 [0.74-2.36] vs. 1.78 [0.95-2.88], respectively), but the improvement in prediction varied as a function of the selected country and timeframe. The best model varied as a function of country, time range, and period of time selected. Models based on a 7-day moving average led to considerably smaller RMSE values as opposed to those calculated with raw data (median [IQR]: 0.74 [0.47-1.22] vs. 2.15 [1.55-3.89], respectively).InterpretationThe inclusion of digital online searches in statistical models may improve the prediction of the COVID-19 epidemic.FundingEOSCsecretariat.eu has received funding from the European Union’s Horizon Programme call H2020-INFRAEOSC-05-2018-2019, grant Agreement number 831644.

Download Full-text

Data Browser Matsch | Mazia: Web Application to access microclimatic time series of an ecological research site

Research Ideas and Outcomes ◽

10.3897/rio.7.e63748 ◽

2021 ◽

Vol 7 ◽

Author(s):

Martin Palma ◽

Alessandro Zandonai ◽

Luca Cattani ◽

Johannes Klotz ◽

Giulio Genova ◽

...

Keyword(s):

Time Series ◽

Web Application ◽

Scientific Data ◽

Time Range ◽

Ecological Research ◽

Data Accessibility ◽

Web Based ◽

South Tyrol ◽

Research Site ◽

User Friendly

Easily accessible data is an essential requirement for scientific data analysis. The Data Browser Matsch | Mazia was designed to provide a fast and comprehensible solution to access, visualize and download the microclimatic measurements of the IT 25 LT(S)ER Match | Mazia research site in South Tyrol, Northern Italy, with the overall aim to provide straightforward data accessibility and enhance dissemination. Data Browser Matsch | Mazia is a user-friendly web-based application to visualize and download micrometeorological and biophysical time series of the Long-Term Socio-Ecological Research site Matsch | Mazia in South Tyrol, Italy. It is designed both for the general public and researchers. The Data Browser Matsch | Mazia drop-down menus allow the user to query the InfluxDB database in the backend by selecting the measurements, time range, land use and elevation. Interactive Grafana dashboards show dynamic graphs of the time series.

Download Full-text

Assessment of long-range cross-correlations in cardiorespiratory and cardiovascular interactions

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2020.0249 ◽

2021 ◽

Vol 379 (2212) ◽

Cited By ~ 1

Author(s):

Akio Nakata ◽

Miki Kaneko ◽

Chinami Taki ◽

Naoko Evans ◽

Taiki Shigematsu ◽

...

Keyword(s):

Time Series ◽

Long Range ◽

Moving Average ◽

Time Lag ◽

Higher Order ◽

Cross Correlation Analysis ◽

Challenges And Opportunities ◽

Cross Correlations ◽

Different Order

We propose higher-order detrending moving-average cross-correlation analysis (DMCA) to assess the long-range cross-correlations in cardiorespiratory and cardiovascular interactions. Although the original (zeroth-order) DMCA employs a simple moving-average detrending filter to remove non-stationary trends embedded in the observed time series, our approach incorporates a Savitzky–Golay filter as a higher-order detrending method. Because the non-stationary trends can adversely affect the long-range correlation assessment, the higher-order detrending serves to improve accuracy. To achieve a more reliable characterization of the long-range cross-correlations, we demonstrate the importance of the following steps: correcting the time scale, confirming the consistency of different order DMCAs, and estimating the time lag between time series. We applied this methodological framework to cardiorespiratory and cardiovascular time series analysis. In the cardiorespiratory interaction, respiratory and heart rate variability (HRV) showed long-range auto-correlations; however, no factor was shared between them. In the cardiovascular interaction, beat-to-beat systolic blood pressure and HRV showed long-range auto-correlations and shared a common long-range, cross-correlated factor. This article is part of the theme issue ‘Advanced computation in cardiovascular physiology: new challenges and opportunities’.

Download Full-text

Predictive analysis of the number of human brucellosis cases in Xinjiang, China

Scientific Reports ◽

10.1038/s41598-021-91176-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yanling Zheng ◽

Liping Zhang ◽

Chunxia Wang ◽

Kai Wang ◽

Gang Guo ◽

...

Keyword(s):

Public Health ◽

Time Series ◽

Prevention And Control ◽

Prediction Models ◽

Moving Average ◽

Time Lag ◽

Health Concern ◽

Human Brucellosis ◽

Prediction Analysis ◽

And Control

AbstractBrucellosis is one of the major public health problems in China, and human brucellosis represents a serious public health concern in Xinjiang and requires a prediction analysis to help making early planning and putting forward science preventive and control countermeasures. According to the characteristics of the time series of monthly reported cases of human brucellosis in Xinjiang from January 2008 to June 2020, we used seasonal autoregressive integrated moving average (SARIMA) method and nonlinear autoregressive regression neural network (NARNN) method, which are widely prevalent and have high prediction accuracy, to construct prediction models and make prediction analysis. Finally, we established the SARIMA((1,4,5,7),0,0)(0,1,2)12 model and the NARNN model with a time lag of 5 and a hidden layer neuron of 10. Both models have high fitting performance. After comparing the accuracies of two established models, we found that the SARIMA((1,4,5,7),0,0)(0,1,2)12 model was better than the NARNN model. We used the SARIMA((1,4,5,7),0,0)(0,1,2)12 model to predict the number of monthly reported cases of human brucellosis in Xinjiang from July 2020 to December 2021, and the results showed that the fluctuation of the time series from July 2020 to December 2021 was similar to that of the last year and a half while maintaining the current prevention and control ability. The methodology applied here and its prediction values of this study could be useful to give a scientific reference for prevention and control human brucellosis.

Download Full-text

Principal Component Analysis for Weighted Data in the Procedure of Multidimensional Statistical Forecasting

Statistics and Economics ◽

10.21686/2500-3925-2021-2-4-11 ◽

2021 ◽

Vol 18 (2) ◽

pp. 4-11

Author(s):

V. V. Nikitin ◽

D. V. Bobin

Keyword(s):

Principal Component Analysis ◽

Time Series ◽

Confidence Interval ◽

Time Lag ◽

Principal Component ◽

Component Analysis ◽

Data Set ◽

Modified Method ◽

Weighted Data ◽

Main Component

Purpose of the research. Let’s assume that the dynamics of the state of some object is being investigated. Its state is described by a system of specified indicators. Among them, some may be a linear combination of other indicators. The aim of any forecasting procedure is to solve two problems: first, to estimate the expected forecast value, and second, to estimate the confidence interval for possible other forecast values. The prediction procedure is multidimensional. Since the indicators describe the same object, in addition to explicit dependencies, there may be hidden dependencies among them. The principal component analysis effectively takes into account the variation of data in the system of the studied indicators. Therefore, it is desirable to use this method in the forecasting procedure. The results of forecasting would be more adequate if it were possible to implement different forecasting strategies. But this will require a modification of the traditional principal component analysis. Therefore, this is the main aim of this study. A related aim is to investigate the possibility of solving the second forecasting problem, which is more complex than the first one. Materials and research methods. When estimating the confidence interval, it is necessary to specify the procedure for estimating the expected forecast value. At the same time, it would be useful to use the methods of multidimensional time series. Usually, different time series models use the concept of time lag. Their number and weight significance in the model may be different. In this study, we propose a time series model based on the exponential smoothing method. The prediction procedure is multidimensional. It will rely on the rule of agreed upon data change. Therefore, the algorithm for predictive evaluation of a particular indicator is presented in a form that will be convenient for building and practical use of this rule in the future. The principal component analysis should take into account the weights of the indicator values. This is necessary for the implementation of various strategies for estimating the boundaries of the forecast values interval. The proposed standardization of weighted data promotes to the implementation of the main theorem of factor analysis. This ensures the construction of an orthonormal basis in the factor area. At the same time, it was not necessary to build an iterative algorithm, which is typical for such studies. Results. For the test data set, comparative calculations were performed using the traditional and weighted principal component analysis. It shows that the main characteristics of the component analysis are preserved. One of the indicators under consideration clearly depends on the others. Therefore, both methods show that the number of factors is less than the number of indicators. All indicators have a good relationship with the factors. In the traditional method, the dependent indicator is included in the first main component. In the modified method, this indicator is better related to the second component. Conclusion. It was shown that the elements of the factor matrix corresponding to the forecast time can be expressed as weighted averages of the previous factor values. This will allow us to estimate the limits of the confidence interval for each individual indicator, as well as for the complex indicator of the entire system. This takes into account both the consistency of data changes and the forecasting strategy.

Download Full-text

A Comparison of Hurst Exponent Estimators in Long-range Dependent Curve Time Series

Journal of Time Series Econometrics ◽

10.1515/jtse-2019-0009 ◽

2020 ◽

Vol 12 (1) ◽

Author(s):

Han Lin Shang

Keyword(s):

Time Series ◽

Long Range ◽

Hurst Exponent ◽

Moving Average ◽

Estimation Method ◽

Principal Component ◽

Functional Principal Component Analysis ◽

Finite Sample ◽

Long Run ◽

Finite Sample Bias

AbstractThe Hurst exponent is the simplest numerical summary of self-similar long-range dependent stochastic processes. We consider the estimation of Hurst exponent in long-range dependent curve time series. Our estimation method begins by constructing an estimate of the long-run covariance function, which we use, via dynamic functional principal component analysis, in estimating the orthonormal functions spanning the dominant sub-space of functional time series. Within the context of functional autoregressive fractionally integrated moving average (ARFIMA) models, we compare finite-sample bias, variance and mean square error among some time- and frequency-domain Hurst exponent estimators and make our recommendations.

Download Full-text

Automatic Identification of Autoregressive Integrated Moving Average Time Series

IIE Transactions ◽

10.1080/05695558208974599 ◽

1982 ◽

Vol 14 (3) ◽

pp. 156-166 ◽

Cited By ~ 3

Author(s):

Chin-Sheng Alan Kang ◽

David D. Bedworth ◽

Dwayne A. Rollier

Keyword(s):

Time Series ◽

Moving Average ◽

Automatic Identification ◽

Autoregressive Integrated Moving Average

Download Full-text

Smoothing Facilitates the Detection of Coupled Responses in Psychophysiological Time Series

Journal of Psychophysiology ◽

10.1027//0269-8803.14.1.1 ◽

2000 ◽

Vol 14 (1) ◽

pp. 1-10 ◽

Cited By ~ 8

Author(s):

Joni Kettunen ◽

Niklas Ravaja ◽

Liisa Keltikangas-Järvinen

Keyword(s):

Time Series ◽

Type I Error ◽

Electrodermal Activity ◽

Moving Average ◽

Type I ◽

Facial Electromyography ◽

Response Characteristics ◽

Smoothing Methods ◽

Response Systems ◽

Different Response

Abstract We examined the use of smoothing to enhance the detection of response coupling from the activity of different response systems. Three different types of moving average smoothers were applied to both simulated interbeat interval (IBI) and electrodermal activity (EDA) time series and to empirical IBI, EDA, and facial electromyography time series. The results indicated that progressive smoothing increased the efficiency of the detection of response coupling but did not increase the probability of Type I error. The power of the smoothing methods depended on the response characteristics. The benefits and use of the smoothing methods to extract information from psychophysiological time series are discussed.

Download Full-text

A COMPARATIVE STUDY BETWEEN UNIVARIATE AND BIVARIATE TIME SERIES MODELS FOR CRUDE PALM OIL INDUSTRY IN PENINSULAR MALAYSIA

MALAYSIAN JOURNAL OF COMPUTING ◽

10.24191/mjoc.v5i1.6760 ◽

2020 ◽

Vol 5 (1) ◽

pp. 374

Author(s):

Pauline Jin Wee Mah ◽

Nur Nadhirah Nanyan

Keyword(s):

Time Series ◽

Palm Oil ◽

Moving Average ◽

Forecast Accuracy ◽

Peninsular Malaysia ◽

Time Series Models ◽

Crude Palm Oil ◽

Univariate Time Series ◽

Import And Export ◽

Bivariate Time Series

The main purpose of this study is to compare the performances of univariate and bivariate models on four time series variables of the crude palm oil industry in Peninsular Malaysia. The monthly data for the four variables, which are the crude palm oil production, price, import and export, were obtained from Malaysian Palm Oil Board (MPOB) and Malaysian Palm Oil Council (MPOC). In the first part of this study, univariate time series models, namely, the autoregressive integrated moving average (ARIMA), fractionally integrated autoregressive moving average (ARFIMA) and autoregressive autoregressive (ARAR) algorithm were used for modelling and forecasting purposes. Subsequently, the dependence between any two of the four variables were checked using the residuals’ sample cross correlation functions before modelling the bivariate time series. In order to model the bivariate time series and make prediction, the transfer function models were used. The forecast accuracy criteria used to evaluate the performances of the models were the mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE). The results of the univariate time series showed that the best model for predicting the production was ARIMA while the ARAR algorithm were the best forecast models for predicting both the import and export of crude palm oil. However, ARIMA appeared to be the best forecast model for price based on the MAE and MAPE values while ARFIMA emerged the best model based on the RMSE value. When considering bivariate time series models, the production was dependent on import while the export was dependent on either price or import. The results showed that the bivariate models had better performance compared to the univariate models for production and export of crude palm oil based on the forecast accuracy criteria used.

Download Full-text

Temperature time series prediction based on autoregressive integrated moving average model

Instrumentation Mesure Métrologie ◽

10.3166/i2m.17.443-453 ◽

2018 ◽

Vol 18 (3) ◽

pp. 443-453

Author(s):

Huanhuan ZHENG ◽

Yuxiu BAI ◽

Yaqiong ZHANG

Keyword(s):

Time Series ◽

Moving Average ◽

Time Series Prediction ◽

Temperature Time Series ◽

Average Model ◽

Autoregressive Integrated Moving Average ◽

Moving Average Model ◽

Temperature Time

Download Full-text