scholarly journals Using Sequence Mining to Predict Complex Systems: A Case Study in Influenza Epidemics

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Theyazn H. H. Aldhyani ◽  
Manish R. Joshi ◽  
Shahab A. AlMaaytah ◽  
Ahmed Abdullah Alqarni ◽  
Nizar Alsharif

According to the World Health Organisation, three to five million individuals are infected by influenza, and around 250,000 to 500,000 people die of this infectious disease worldwide. Influenza epidemics pose a serious public health threat. Moreover, graver dangers are encountered with influenza subtypes against which there is little or no preexisting human immunity. Such subtypes of influenza have the potential to cause devastating epidemics. Thus, enhancing surveillance systems for the purpose of detecting influenza epidemics in an early stage can quicken response times and save millions of lives. This paper presents three adapting intelligence models: support vector machine regression (SVMR), artificial neural network using particle swarm optimisation (ANNPSO), and our intelligent time series (INTS) to predict influenza epidemics. The novelty of the current study is that it proposes a new intelligent model to predict influenza outbreaks. The INTS model combines clustering with a time series model to enhance the prediction of influenza outbreaks. The innovation of our proposed model integrates the results obtained from the existing weighted exponential smoothing model with centroids obtained from clustering. We developed a surveillance system for influenza epidemics using Google search queries. The current research is based on a weighted version of the Center for Disease Control and Prevention influenza-like illness activity level obtained from the Center for Disease Control and Prevention data, as well as query data obtained from the Goggle search engine in the USA. The influenza-like illness data was collected from January 4, 2009 (week 1), to December 27, 2015 (week 52), stretching across a total time span of 312 weeks. Google Correlate was used to select search queries related to influenza epidemics. In total, 100 search queries were obtained from Google Correlate, 10 of which were better and more relevant search queries selected in this study. The model was evaluated using online Google search queries collected from Google Correlate. Standard measure performance MSE, RMSE, and MAE were employed to estimate the results of the proposed model. The empirical results of the INTS model showed MSE = 0.003, RMSE = 0.036, and MAE = 0.0185, indicating that the errors of the proposed model are very limited. A comparative model of predicting results between the INTS model, alternative Google Flu Trend (GFT), and autoregression with Google search data is also presented. The proposed model outperformed the existing models.

2017 ◽  
Vol 20 (1) ◽  
pp. 246-262 ◽  
Author(s):  
Jamileh Farajzadeh ◽  
Farhad Alizadeh

Abstract The present study aimed to develop a hybrid model to predict the rainfall time series of Urmia Lake watershed. For this purpose, a model based on discrete wavelet transform, ARIMAX and least squares support vector machine (LSSVM) (W-S-LSSVM) was developed. The proposed model was designed to handle linear, nonlinear and seasonality of rainfall time series. In the proposed model, time series were decomposed into sub-series (approximation (a) and details (d)). Next, the sub-series were predicted separately. In the proposed model, sub-series were fed into SARIMAX to be predicted. The residual of predicted sub-series (error) of the rainfall time series was then fed into LSSVM to predict the residual components. Then, all predicted values were aggregated to rebuild the predicted time series. In order to compare results, first a classic modeling was performed by LSSVM. Later, wavelet-based LSSVM was used to capture the peak values of rainfall. Results revealed that Daubechies 4 and decomposition level 4 (db(4,4)) led to the best outcome. Due to the performance of db(4,4), it was selected to be applied in the proposed model. Based on results, it was observed that the W-S-LSSVM's performance was improved in comparison with other models.


2020 ◽  
Author(s):  
Yulin Hswen ◽  
Amanda Zhang ◽  
Bruno Ventelou

BACKGROUND Asthma affects over 330 million people worldwide. Timing of the asthma event is extremely important and lack of identification of asthma increases the risk of death. A major challenge for health systems is the length of time between symptom onset and care seeking, which could result in delayed treatment initiation and worsening of symptoms. OBJECTIVE This study evaluates the utility of the Internet search query data for the identification the onset of asthma symptoms. METHODS Pearson correlation coefficients between the time series of hospital admissions and Google searches were computed at lag times from 4 weeks prior to hospital admission to 4 weeks after hospital admission. RESULTS Google search volume for asthma had the highest correlation at 2 weeks before hospital admission. CONCLUSIONS Our findings demonstration Internet search queries can earlier predict asthma events and may be a better use for classifying the measurement of timing of symptom onset.


2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Salwa Waeto ◽  
Khanchit Chuarkham ◽  
Arthit Intarasit

Forecasting the tendencies of time series is a challenging task which gives better understanding. The purpose of this paper is to present the hybrid model of support vector regression associated with Autoregressive Integrated Moving Average which is formulated by hybrid methodology. The proposed model is more convenient for practical usage. The tendencies modeling of time series for Thailand’s south insurgency is of interest in this research article. The empirical results using the time series of monthly number of deaths, injuries, and incidents for Thailand’s south insurgency indicate that the proposed hybrid model is an effective way to construct an estimated hybrid model which is better than the classical time series model or support vector regression. The best forecast accuracy is performed by using mean square error.


2012 ◽  
Vol 241-244 ◽  
pp. 1550-1555 ◽  
Author(s):  
Sheng Peng Liu ◽  
Ye Zhang

The forecasting to future developments of the city fire time series is a challenging task that has been addressed by many researchers due to the importance. In this paper, a Nonlinear Auto-Regressive (NAR) prediction model is applied to forecast the city fire data based on support vector regression. The performances of the NAR prediction model in city fire forecasting are compared with the BP neural network method. The experimental results show that the proposed model performs best.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Elaine O. Nsoesie ◽  
Olubusola Oladeji ◽  
Aristide S. Abah Abah ◽  
Martial L. Ndeffo-Mbah

AbstractAlthough acute respiratory infections are a leading cause of mortality in sub-Saharan Africa, surveillance of diseases such as influenza is mostly neglected. Evaluating the usefulness of influenza-like illness (ILI) surveillance systems and developing approaches for forecasting future trends is important for pandemic preparedness. We applied and compared a range of robust statistical and machine learning models including random forest (RF) regression, support vector machines (SVM) regression, multivariable linear regression and ARIMA models to forecast 2012 to 2018 trends of reported ILI cases in Cameroon, using Google searches for influenza symptoms, treatments, natural or traditional remedies as well as, infectious diseases with a high burden (i.e., AIDS, malaria, tuberculosis). The R2 and RMSE (Root Mean Squared Error) were statistically similar across most of the methods, however, RF and SVM had the highest average R2 (0.78 and 0.88, respectively) for predicting ILI per 100,000 persons at the country level. This study demonstrates the need for developing contextualized approaches when using digital data for disease surveillance and the usefulness of search data for monitoring ILI in sub-Saharan African countries.


2020 ◽  
Vol 38 (4) ◽  
pp. 933-940
Author(s):  
Yan Wang ◽  
Zhongshui Man ◽  
Meihua Lu

The productivity of coalbed methane (CBM) depends heavily on the heat environment, and directly reflects the quality of the well. Following the theories of phase space reconstruction and Bayesian evidence framework, this paper puts forward a Bayes-least squares-support vector machine (Bayes-LS-SVM) model for the prediction of energy-efficient productivity of CBM under Bayesian evidence network based on chaotic time series. The energy-efficient productivity stands for the gas and water production of CBM wells at a low energy consumption, despite the disturbance from the heat environment. The proposed model avoids the local optimum trap of backpropagation neural network (BPNN), and overcomes the main defects of the SVM: high time consumption of parameter determination, and proneness to overfitting. In our model, the model parameters are optimized through three-layer Bayesian evidence inference, and the input vector for prediction is selected adaptively. In this way, the model construction is not too empirical, and the constructed model is highly adaptive. Then, the theory on phase space reconstruction was applied to investigate the chaotic property of the time series on CBM production, and the Bayes-LS-SVM was adopted to predict the time series after phase space reconstruction, in comparison with neural network prediction methods like SVM and BPNN. Experimental results show that the proposed model boast quick computing, accurate fitting, flexible structure, and strong generalization ability.


2020 ◽  
Vol 11 (1) ◽  
pp. 83-91
Author(s):  
Sneha Paul ◽  
Shuvendu Roy

Global warming has caused a significant increment in surface temperature around the world, including Bangladesh. In this study, the temperature data of Bangladesh over the past 100 years has been analyzed to see the temperature increment pattern. It has been seen that the average temperature has risen by 10C over the last century. Using daily average temperature data of Bangladesh, machine learning-based time series forecasting model has been developed to predict the future temperature of Bangladesh. The model can predict the minimum, maximum, and average temperatures of any year in the future. This has been treated as a regression problem and Linear, Polynomial, and Support Vector Regression have been proposed to build the prediction model. The proposed model has a mean square error of 0.00470C which is a good margin for such a model. Using the model, the average temperature of Bangladesh is predicted over the next hundred years. Journal of Engineering Science 11(1), 2020, 83-91


2020 ◽  
Vol 25 (21) ◽  
Author(s):  
Paul P Schneider ◽  
Christel JAW van Gool ◽  
Peter Spreeuwenberg ◽  
Mariëtte Hooiveld ◽  
Gé A Donker ◽  
...  

Background Despite the early development of Google Flu Trends in 2009, standards for digital epidemiology methods have not been established and research from European countries is scarce. Aim In this article, we study the use of web search queries to monitor influenza-like illness (ILI) rates in the Netherlands in real time. Methods In this retrospective analysis, we simulated the weekly use of a prediction model for estimating the then-current ILI incidence across the 2017/18 influenza season solely based on Google search query data. We used weekly ILI data as reported to The European Surveillance System (TESSY)  each week, and we removed the then-last 4 weeks from our dataset. We then fitted a prediction model based on the then-most-recent search query data from Google Trends to fill the 4-week gap (‘Nowcasting’). Lasso regression, in combination with cross-validation, was applied to select predictors and to fit the 52 models, one for each week of the season. Results The models provided accurate predictions with a mean and maximum absolute error of 1.40 (95% confidence interval: 1.09–1.75) and 6.36 per 10,000 population. The onset, peak and end of the epidemic were predicted with an error of 1, 3 and 2 weeks, respectively. The number of search terms retained as predictors ranged from three to five, with one keyword, ‘griep’ (‘flu’), having the most weight in all models. Discussion This study demonstrates the feasibility of accurate, real-time ILI incidence predictions in the Netherlands using Google search query data.


Sign in / Sign up

Export Citation Format

Share Document