Using Sequence Mining to Predict Complex Systems: A Case Study in Influenza Epidemics

Complexity ◽

10.1155/2021/9929013 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Theyazn H. H. Aldhyani ◽

Manish R. Joshi ◽

Shahab A. AlMaaytah ◽

Ahmed Abdullah Alqarni ◽

Nizar Alsharif

Keyword(s):

Time Series ◽

Disease Control ◽

Activity Level ◽

Support Vector ◽

Search Queries ◽

Influenza Like Illness ◽

Proposed Model ◽

Control And Prevention ◽

Influenza Outbreaks ◽

Google Search

According to the World Health Organisation, three to five million individuals are infected by influenza, and around 250,000 to 500,000 people die of this infectious disease worldwide. Influenza epidemics pose a serious public health threat. Moreover, graver dangers are encountered with influenza subtypes against which there is little or no preexisting human immunity. Such subtypes of influenza have the potential to cause devastating epidemics. Thus, enhancing surveillance systems for the purpose of detecting influenza epidemics in an early stage can quicken response times and save millions of lives. This paper presents three adapting intelligence models: support vector machine regression (SVMR), artificial neural network using particle swarm optimisation (ANNPSO), and our intelligent time series (INTS) to predict influenza epidemics. The novelty of the current study is that it proposes a new intelligent model to predict influenza outbreaks. The INTS model combines clustering with a time series model to enhance the prediction of influenza outbreaks. The innovation of our proposed model integrates the results obtained from the existing weighted exponential smoothing model with centroids obtained from clustering. We developed a surveillance system for influenza epidemics using Google search queries. The current research is based on a weighted version of the Center for Disease Control and Prevention influenza-like illness activity level obtained from the Center for Disease Control and Prevention data, as well as query data obtained from the Goggle search engine in the USA. The influenza-like illness data was collected from January 4, 2009 (week 1), to December 27, 2015 (week 52), stretching across a total time span of 312 weeks. Google Correlate was used to select search queries related to influenza epidemics. In total, 100 search queries were obtained from Google Correlate, 10 of which were better and more relevant search queries selected in this study. The model was evaluated using online Google search queries collected from Google Correlate. Standard measure performance MSE, RMSE, and MAE were employed to estimate the results of the proposed model. The empirical results of the INTS model showed MSE = 0.003, RMSE = 0.036, and MAE = 0.0185, indicating that the errors of the proposed model are very limited. A comparative model of predicting results between the INTS model, alternative Google Flu Trend (GFT), and autoregression with Google search data is also presented. The proposed model outperformed the existing models.

Download Full-text

A hybrid linear–nonlinear approach to predict the monthly rainfall over the Urmia Lake watershed using wavelet-SARIMAX-LSSVM conjugated model

Journal of Hydroinformatics ◽

10.2166/hydro.2017.013 ◽

2017 ◽

Vol 20 (1) ◽

pp. 246-262 ◽

Cited By ~ 17

Author(s):

Jamileh Farajzadeh ◽

Farhad Alizadeh

Keyword(s):

Time Series ◽

Urmia Lake ◽

Support Vector ◽

Discrete Wavelet ◽

Rainfall Time Series ◽

Model Based ◽

Nonlinear Approach ◽

Proposed Model ◽

Model Time Series ◽

Predicted Values

Abstract The present study aimed to develop a hybrid model to predict the rainfall time series of Urmia Lake watershed. For this purpose, a model based on discrete wavelet transform, ARIMAX and least squares support vector machine (LSSVM) (W-S-LSSVM) was developed. The proposed model was designed to handle linear, nonlinear and seasonality of rainfall time series. In the proposed model, time series were decomposed into sub-series (approximation (a) and details (d)). Next, the sub-series were predicted separately. In the proposed model, sub-series were fed into SARIMAX to be predicted. The residual of predicted sub-series (error) of the rainfall time series was then fed into LSSVM to predict the residual components. Then, all predicted values were aggregated to rebuild the predicted time series. In order to compare results, first a classic modeling was performed by LSSVM. Later, wavelet-based LSSVM was used to capture the peak values of rainfall. Results revealed that Daubechies 4 and decomposition level 4 (db(4,4)) led to the best outcome. Due to the performance of db(4,4), it was selected to be applied in the proposed model. Based on results, it was observed that the W-S-LSSVM's performance was improved in comparison with other models.

Download Full-text

Estimation of asthma symptom onset using Internet search queries: A lag-time series analysis (Preprint)

10.2196/preprints.18593 ◽

2020 ◽

Author(s):

Yulin Hswen ◽

Amanda Zhang ◽

Bruno Ventelou

Keyword(s):

Time Series ◽

Hospital Admission ◽

Symptom Onset ◽

Hospital Admissions ◽

Pearson Correlation ◽

Internet Search ◽

Search Queries ◽

Risk Of Death ◽

Search Volume ◽

Google Search

BACKGROUND Asthma affects over 330 million people worldwide. Timing of the asthma event is extremely important and lack of identification of asthma increases the risk of death. A major challenge for health systems is the length of time between symptom onset and care seeking, which could result in delayed treatment initiation and worsening of symptoms. OBJECTIVE This study evaluates the utility of the Internet search query data for the identification the onset of asthma symptoms. METHODS Pearson correlation coefficients between the time series of hospital admissions and Google searches were computed at lag times from 4 weeks prior to hospital admission to 4 weeks after hospital admission. RESULTS Google search volume for asthma had the highest correlation at 2 weeks before hospital admission. CONCLUSIONS Our findings demonstration Internet search queries can earlier predict asthma events and may be a better use for classifying the measurement of timing of symptom onset.

Download Full-text

Forecasting Time Series Movement Direction with Hybrid Methodology

Journal of Probability and Statistics ◽

10.1155/2017/3174305 ◽

2017 ◽

Vol 2017 ◽

pp. 1-8 ◽

Cited By ~ 3

Author(s):

Salwa Waeto ◽

Khanchit Chuarkham ◽

Arthit Intarasit

Keyword(s):

Time Series ◽

Support Vector Regression ◽

Hybrid Model ◽

Moving Average ◽

Forecast Accuracy ◽

Movement Direction ◽

Support Vector ◽

Autoregressive Integrated Moving Average ◽

Proposed Model ◽

Hybrid Methodology

Forecasting the tendencies of time series is a challenging task which gives better understanding. The purpose of this paper is to present the hybrid model of support vector regression associated with Autoregressive Integrated Moving Average which is formulated by hybrid methodology. The proposed model is more convenient for practical usage. The tendencies modeling of time series for Thailand’s south insurgency is of interest in this research article. The empirical results using the time series of monthly number of deaths, injuries, and incidents for Thailand’s south insurgency indicate that the proposed hybrid model is an effective way to construct an estimated hybrid model which is better than the classical time series model or support vector regression. The best forecast accuracy is performed by using mean square error.

Download Full-text

City Fire Forecasts and Analysis Based on Nonlinear Auto-Regressive Time-Series Model

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.241-244.1550 ◽

2012 ◽

Vol 241-244 ◽

pp. 1550-1555 ◽

Cited By ~ 1

Author(s):

Sheng Peng Liu ◽

Ye Zhang

Keyword(s):

Neural Network ◽

Time Series ◽

Prediction Model ◽

Bp Neural Network ◽

Support Vector ◽

Future Developments ◽

Proposed Model ◽

Network Method ◽

Auto Regressive ◽

The City

The forecasting to future developments of the city fire time series is a challenging task that has been addressed by many researchers due to the importance. In this paper, a Nonlinear Auto-Regressive (NAR) prediction model is applied to forecast the city fire data based on support vector regression. The performances of the NAR prediction model in city fire forecasting are compared with the BP neural network method. The experimental results show that the proposed model performs best.

Download Full-text

Comparative evaluation of time series models for predicting influenza outbreaks: application of influenza-like illness data from sentinel sites of healthcare centers in Iran

BMC Research Notes ◽

10.1186/s13104-019-4393-y ◽

2019 ◽

Vol 12 (1) ◽

Cited By ~ 12

Author(s):

Leili Tapak ◽

Omid Hamidi ◽

Mohsen Fathian ◽

Manoochehr Karami

Keyword(s):

Time Series ◽

Comparative Evaluation ◽

Time Series Models ◽

Influenza Like Illness ◽

Influenza Outbreaks

Download Full-text

Forecasting influenza-like illness trends in Cameroon using Google Search Data

Scientific Reports ◽

10.1038/s41598-021-85987-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Elaine O. Nsoesie ◽

Olubusola Oladeji ◽

Aristide S. Abah Abah ◽

Martial L. Ndeffo-Mbah

Keyword(s):

Mean Squared Error ◽

Sub Saharan Africa ◽

Digital Data ◽

Support Vector ◽

Surveillance Systems ◽

African Countries ◽

Influenza Like Illness ◽

Search Data ◽

Sub Saharan ◽

Google Search

AbstractAlthough acute respiratory infections are a leading cause of mortality in sub-Saharan Africa, surveillance of diseases such as influenza is mostly neglected. Evaluating the usefulness of influenza-like illness (ILI) surveillance systems and developing approaches for forecasting future trends is important for pandemic preparedness. We applied and compared a range of robust statistical and machine learning models including random forest (RF) regression, support vector machines (SVM) regression, multivariable linear regression and ARIMA models to forecast 2012 to 2018 trends of reported ILI cases in Cameroon, using Google searches for influenza symptoms, treatments, natural or traditional remedies as well as, infectious diseases with a high burden (i.e., AIDS, malaria, tuberculosis). The R2 and RMSE (Root Mean Squared Error) were statistically similar across most of the methods, however, RF and SVM had the highest average R2 (0.78 and 0.88, respectively) for predicting ILI per 100,000 persons at the country level. This study demonstrates the need for developing contextualized approaches when using digital data for disease surveillance and the usefulness of search data for monitoring ILI in sub-Saharan African countries.

Download Full-text

Prediction of Energy-Efficient Production of Coalbed Methane Based on Chaotic Time Series and Bayes-Least Squares-Support Vector Machine

International Journal of Heat and Technology ◽

10.18280/ijht.380420 ◽

2020 ◽

Vol 38 (4) ◽

pp. 933-940

Author(s):

Yan Wang ◽

Zhongshui Man ◽

Meihua Lu

Keyword(s):

Neural Network ◽

Time Series ◽

Support Vector Machine ◽

Phase Space ◽

Energy Efficient ◽

Coalbed Methane ◽

Phase Space Reconstruction ◽

Support Vector ◽

Proposed Model ◽

Bayesian Evidence

The productivity of coalbed methane (CBM) depends heavily on the heat environment, and directly reflects the quality of the well. Following the theories of phase space reconstruction and Bayesian evidence framework, this paper puts forward a Bayes-least squares-support vector machine (Bayes-LS-SVM) model for the prediction of energy-efficient productivity of CBM under Bayesian evidence network based on chaotic time series. The energy-efficient productivity stands for the gas and water production of CBM wells at a low energy consumption, despite the disturbance from the heat environment. The proposed model avoids the local optimum trap of backpropagation neural network (BPNN), and overcomes the main defects of the SVM: high time consumption of parameter determination, and proneness to overfitting. In our model, the model parameters are optimized through three-layer Bayesian evidence inference, and the input vector for prediction is selected adaptively. In this way, the model construction is not too empirical, and the constructed model is highly adaptive. Then, the theory on phase space reconstruction was applied to investigate the chaotic property of the time series on CBM production, and the Bayes-LS-SVM was adopted to predict the time series after phase space reconstruction, in comparison with neural network prediction methods like SVM and BPNN. Experimental results show that the proposed model boast quick computing, accurate fitting, flexible structure, and strong generalization ability.

Download Full-text

Forecasting the Average Temperature Rise in Bangladesh: A Time Series Analysis

Journal of Engineering Science ◽

10.3329/jes.v11i1.49549 ◽

2020 ◽

Vol 11 (1) ◽

pp. 83-91

Author(s):

Sneha Paul ◽

Shuvendu Roy

Keyword(s):

Time Series ◽

Temperature Data ◽

Support Vector ◽

Regression Problem ◽

Average Temperature ◽

The Past ◽

Linear Polynomial ◽

Proposed Model ◽

The Future ◽

Significant Increment

Global warming has caused a significant increment in surface temperature around the world, including Bangladesh. In this study, the temperature data of Bangladesh over the past 100 years has been analyzed to see the temperature increment pattern. It has been seen that the average temperature has risen by 10C over the last century. Using daily average temperature data of Bangladesh, machine learning-based time series forecasting model has been developed to predict the future temperature of Bangladesh. The model can predict the minimum, maximum, and average temperatures of any year in the future. This has been treated as a regression problem and Linear, Polynomial, and Support Vector Regression have been proposed to build the prediction model. The proposed model has a mean square error of 0.00470C which is a good margin for such a model. Using the model, the average temperature of Bangladesh is predicted over the next hundred years. Journal of Engineering Science 11(1), 2020, 83-91

Download Full-text

Relating calls to US poison centers for potential exposures to medications to Centers for Disease Control and Prevention reporting of influenza-like illness

Clinical Toxicology ◽

10.3109/15563650.2015.1135336 ◽

2016 ◽

Vol 54 (3) ◽

pp. 235-240

Author(s):

Gillian A. Beauchamp ◽

Nathanael J. McKeown ◽

Sergio Rodriguez ◽

Daniel A. Spyker

Keyword(s):

Disease Control ◽

Influenza Like Illness ◽

Control And Prevention ◽

Poison Centers ◽

Centers For Disease Control

Download Full-text

Using web search queries to monitor influenza-like illness: an exploratory retrospective analysis, Netherlands, 2017/18 influenza season

Eurosurveillance ◽

10.2807/1560-7917.es.2020.25.21.1900221 ◽

2020 ◽

Vol 25 (21) ◽

Cited By ~ 1

Author(s):

Paul P Schneider ◽

Christel JAW van Gool ◽

Peter Spreeuwenberg ◽

Mariëtte Hooiveld ◽

Gé A Donker ◽

...

Keyword(s):

The Netherlands ◽

Prediction Model ◽

Retrospective Analysis ◽

Real Time ◽

Web Search ◽

Influenza Season ◽

Search Query ◽

Search Queries ◽

Influenza Like Illness ◽

Google Search

Background Despite the early development of Google Flu Trends in 2009, standards for digital epidemiology methods have not been established and research from European countries is scarce. Aim In this article, we study the use of web search queries to monitor influenza-like illness (ILI) rates in the Netherlands in real time. Methods In this retrospective analysis, we simulated the weekly use of a prediction model for estimating the then-current ILI incidence across the 2017/18 influenza season solely based on Google search query data. We used weekly ILI data as reported to The European Surveillance System (TESSY) each week, and we removed the then-last 4 weeks from our dataset. We then fitted a prediction model based on the then-most-recent search query data from Google Trends to fill the 4-week gap (‘Nowcasting’). Lasso regression, in combination with cross-validation, was applied to select predictors and to fit the 52 models, one for each week of the season. Results The models provided accurate predictions with a mean and maximum absolute error of 1.40 (95% confidence interval: 1.09–1.75) and 6.36 per 10,000 population. The onset, peak and end of the epidemic were predicted with an error of 1, 3 and 2 weeks, respectively. The number of search terms retained as predictors ranged from three to five, with one keyword, ‘griep’ (‘flu’), having the most weight in all models. Discussion This study demonstrates the feasibility of accurate, real-time ILI incidence predictions in the Netherlands using Google search query data.

Download Full-text