Forecasting of Categorical Time Series Using a Regression Model

2003 ◽  
Vol 18 (2) ◽  
Author(s):  
Helmut Pruscha ◽  
Axel Göttlein
2021 ◽  
Vol 11 (14) ◽  
pp. 6594
Author(s):  
Yu-Chia Hsu

The interdisciplinary nature of sports and the presence of various systemic and non-systemic factors introduce challenges in predicting sports match outcomes using a single disciplinary approach. In contrast to previous studies that use sports performance metrics and statistical models, this study is the first to apply a deep learning approach in financial time series modeling to predict sports match outcomes. The proposed approach has two main components: a convolutional neural network (CNN) classifier for implicit pattern recognition and a logistic regression model for match outcome judgment. First, the raw data used in the prediction are derived from the betting market odds and actual scores of each game, which are transformed into sports candlesticks. Second, CNN is used to classify the candlesticks time series on a graphical basis. To this end, the original 1D time series are encoded into 2D matrix images using Gramian angular field and are then fed into the CNN classifier. In this way, the winning probability of each matchup team can be derived based on historically implied behavioral patterns. Third, to further consider the differences between strong and weak teams, the CNN classifier adjusts the probability of winning the match by using the logistic regression model and then makes a final judgment regarding the match outcome. We empirically test this approach using 18,944 National Football League game data spanning 32 years and find that using the individual historical data of each team in the CNN classifier for pattern recognition is better than using the data of all teams. The CNN in conjunction with the logistic regression judgment model outperforms the CNN in conjunction with SVM, Naïve Bayes, Adaboost, J48, and random forest, and its accuracy surpasses that of betting market prediction.


Author(s):  
Yumei Liu ◽  
Ningguo Qiao ◽  
Congcong Zhao ◽  
Jiaojiao Zhuang ◽  
Guangdong Tian

Accurate vibration time series modeling can mine the internal law of data and provide valuable references for reliability assessment. To improve the prediction accuracy, this study proposes a hybrid model – called the AR–SVR–CPSO hybrid model – that combines the auto regression (AR) and support vector regression (SVR) models, with the weights optimized by the chaotic particle swarm optimization (CPSO) algorithm. First, the auto regression model with the difference method is employed to model the vibration time series. Second, the support vector regression model with the phase space reconstruction is constructed for predicting the vibration time series once more. Finally, the predictions of the AR and SVR models are weighted and summed together, with the weights being optimized by the CPSO. In addition, the data collected from the reliability test platform of high-speed train transmission systems and the “NASA prognostics data repository” are used to validate the hybrid model. The experimental results demonstrate that the hybrid model proposed in this study outperforms the traditional AR and SVR models.


2020 ◽  
Vol 54 (2) ◽  
pp. 597-614
Author(s):  
Shanoli Samui Pal ◽  
Samarjit Kar

In this paper, fuzzified Choquet integral and fuzzy-valued integrand with respect to separate measures like fuzzy measure, signed fuzzy measure and intuitionistic fuzzy measure are used to develop regression model for forecasting. Fuzzified Choquet integral is used to build a regression model for forecasting time series with multiple attributes as predictor attributes. Linear regression based forecasting models are suffering from low accuracy and unable to approximate the non-linearity in time series. Whereas Choquet integral can be used as a general non-linear regression model with respect to non classical measures. In the Choquet integral based regression model parameters are optimized by using a real coded genetic algorithm (GA). In these forecasting models, fuzzified integrands denote the participation of an individual attribute or a group of attributes to predict the current situation. Here, more generalized Choquet integral, i.e., fuzzified Choquet integral is used in case of non-linear time series forecasting models. Three different real stock exchange data are used to predict the time series forecasting model. It is observed that the accuracy of prediction models highly depends on the non-linearity of the time series.


2017 ◽  
Vol 47 (4) ◽  
Author(s):  
Liz Gonçalves Rodrigues ◽  
Maria Helena Cosendey de Aquino ◽  
Márcio Roberto Silva ◽  
Letícia Caldas Mendonça ◽  
Juliana França Monteiro de Mendonça ◽  
...  

ABSTRACT: Bulk tank somatic cell counts (BTSCC) is widely used to monitore the mammary gland health at the herd and regional level. The BTSCC time series from specific regions or countries can be used to compare the mammary gland health and estimate the trend of subclinical mastitis at the regional level. Three time series of BTSCC from dairy herds located in the USA and the Southeastern Brazil were evaluated from 1995 to 2014. Descriptive statistics and a linear regression model were used to evaluate the data of the BTSCC time series. The mean of annual geometric mean of BTSCC (AGM) and the percentage of dairy herds with a BTSCC greater than 400,000 cells mL-1 (%>400) were significantly different (P<0.05) according to the countries and the times series. Linear regression model used for the USA time series was statistically significant for AGM and the %>400 (P<0.05). The first and second USA time series presented an increasing and decreasing trend for AGM and the %>400, respectively. The linear regression model for the Brazil time series was not significant (P>0.05) for both dependent variables (AGM and %>400). The Brazil time series showed no increasing or decreasing trend for the AGM and %>400. Consequently, approximately 40 to 50% of the dairy herds from southeastern Brazil will not achieve the regulatory limits for BTSCC over the next years.


Author(s):  
Rati WONGSATHAN

The novel coronavirus 2019 (COVID-19) pandemic was declared a global health crisis. The real-time accurate and predictive model of the number of infected cases could help inform the government of providing medical assistance and public health decision-making. This work is to model the ongoing COVID-19 spread in Thailand during the 1st and 2nd phases of the pandemic using the simple but powerful method based on the model-free and time series regression models. By employing the curve fitting, the model-free method using the logistic function, hyperbolic tangent function, and Gaussian function was applied to predict the number of newly infected patients and accumulate the total number of cases, including peak and viral cessation (ending) date. Alternatively, with a significant time-lag of historical data input, the regression model predicts those parameters from 1-day-ahead to 1-month-ahead. To obtain optimal prediction models, the parameters of the model-free method are fine-tuned through the genetic algorithm, whereas the generalized least squares update the parameters of the regression model. Assuming the future trend continues to follow the past pattern, the expected total number of patients is approximately 2,689 - 3,000 cases. The estimated viral cessation dates are May 2, 2020 (using Gaussian function), May 4, 2020 (using a hyperbolic function), and June 5, 2020 (using a logistic function), whereas the peak time occurred on April 5, 2020. Moreover, the model-free method performs well for long-term prediction, whereas the regression model is suitable for short-term prediction. Furthermore, the performances of the regression models yield a highly accurate forecast with lower RMSE and higher R2 up to 1-week-ahead. HIGHLIGHTS COVID-19 model for Thailand during the first and second phases of the epidemic The model-free method using the logistic function, hyperbolic tangent function, and Gaussian function  applied to predict the basic measures of the outbreak Regression model predicts those measures from one-day-ahead to one-month-ahead The parameters of the model-free method are fine-tuned through the genetic algorithm  GRAPHICAL ABSTRACT


Author(s):  
Liliya Andreevna Landman ◽  
Andrei Vladimirovich Faddeenkov

The concept of structure is used to describe a set of stable relations between the main parts of the object, which describe its integrity and identity, i.e, preserving the basic properties for a wide range of internal and external changes. This concept usually relates to the concepts of system and organization. The structure expresses a stable part of the system that is slightly changed during different reforms. Over the years structural changes take place because of active economic policy or as a result of spontaneous, uncontrollable processes. Therefore, it seems to be quite natural to find out whether there have been structural changes in the observation period, and to find them reflected in the specification of the model. The basic ideas of methods for determining structural changes in the time series dynamics have been considered, such as Chow test, Gujarati test and Poirier method. The power study was conducted for the three possible cases of change in time series trends. The random error was modeled according to the standard normal distribution. A linear multiple regression model with three independent variables was used as a time series model. Estimation of the vector of unknown parameters of the model was conducted using least squares method. For each of the three criteria the of test the null hypothesis about time series instability was carried out using the F -criterion, which involves finding the residual sum of squares of a regression model and analysis of correlation between its decline and the loss of degrees of freedom. It can be noted that Gujarati and Poirier equations have a more complex structure than equation of Chow test; however, using Chow test assumes estimation of the parameters of the three regression equations.


2021 ◽  
Vol 18 (32) ◽  
Author(s):  
Stanko Stanić ◽  
Bojan Baškot

Panel regression model may seem like an appealing solution in conditions of limited time series. This is often used as a shortcut to achieve deeper data set by setting several individual cases on the same time dimension, where cross units visually but not really multiply a time frame. Macroeconometrics of the Western Balkan region assumes short time series issue. Additionally, the structural brakes are numerous. Panel regression may seem like a solution, but there are some limitations that should be considered.


2000 ◽  
Vol 42 (3-4) ◽  
pp. 403-408 ◽  
Author(s):  
R.-F. Yu ◽  
S.-F. Kang ◽  
S.-L. Liaw ◽  
M.-c. Chen

Coagulant dosing is one of the major operation costs in water treatment plant, and conventional control of this process for most plants is generally determined by the jar test. However, this method can only provide periodic information and is difficult to apply to automatic control. This paper presents the feasibility of applying artificial neural network (ANN) to automatically control the coagulant dosing in water treatment plant. Five on-line monitoring variables including turbidity (NTUin), pH (pHin) and conductivity (Conin) in raw water, effluent turbidity (NTUout) of settling tank, and alum dosage (Dos) were used to build the coagulant dosing prediction model. Three methods including regression model, time series model and ANN models were used to predict alum dosage. According to the result of this study, the regression model performed a poor prediction on coagulant dosage. Both time-series and ANN models performed precise prediction results of dosage. The ANN model with ahead coagulant dosage performed the best prediction of alum dosage with a R2 of 0.97 (RMS=0.016), very low average predicted error of 0.75 mg/L of alum were also found in the ANN model. Consequently, the application of ANN model to control the coagulant dosing is feasible in water treatment.


Sign in / Sign up

Export Citation Format

Share Document