evaluating forecasts
Recently Published Documents


TOTAL DOCUMENTS

26
(FIVE YEARS 4)

H-INDEX

8
(FIVE YEARS 0)

2021 ◽  
pp. 117-138
Author(s):  
Fabio Gobbi

Abstract The aim of the paper is to compare the forecasting performance of a class of statedependent autoregressive (SDAR) models for univariate time series with two alternative families of nonlinear models, such as the SETAR and the GARCH models. The study is conducted on US GDP growth rate using quarterly data. Two methods of forecast comparison are employed. The first method consists in evaluation the average performance by using two measures such as the root mean square error (RMSE) and the mean absolute error (MAE) over different forecast horizons, while the second method make use of one of the most used statistical test to compare the accuracy of two forecast methods such as the Diebold-Mariano test. JEL classification numbers: C22, E37, F47. Keywords: Nonlinear models for time series, GDP growth rate, Forecasting accuracy.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Edward Wheatcroft

Abstract A scoring rule is a function of a probabilistic forecast and a corresponding outcome used to evaluate forecast performance. There is some debate as to which scoring rules are most appropriate for evaluating forecasts of sporting events. This paper focuses on forecasts of the outcomes of football matches. The ranked probability score (RPS) is often recommended since it is ‘sensitive to distance’, that is it takes into account the ordering in the outcomes (a home win is ‘closer’ to a draw than it is to an away win). In this paper, this reasoning is disputed on the basis that it adds nothing in terms of the usual aims of using scoring rules. A local scoring rule is one that only takes the probability placed on the outcome into consideration. Two simulation experiments are carried out to compare the performance of the RPS, which is non-local and sensitive to distance, the Brier score, which is non-local and insensitive to distance, and the Ignorance score, which is local and insensitive to distance. The Ignorance score outperforms both the RPS and the Brier score, casting doubt on the value of non-locality and sensitivity to distance as properties of scoring rules in this context.


2021 ◽  
Author(s):  
Ole Wulff ◽  
Frédéric Vitart ◽  
Daniela Domeisen

<p>Subseasonal-to-seasonal (S2S) predictions have numerous applications and improving forecast skill on this time scale has become a major effort. Since forecast uncertainty is high on S2S lead times, ensemble prediction systems are essential in order to provide probabilistic forecasts, informing about the range of possible outcomes. For evaluating their performance, these forecasts are routinely compared to a climatological reference forecast. The climatological distribution is commonly assumed to be stationary over the verification period. However, prominent deviations from this assumption exist, especially considering trends associated with climate change. Using synthetic forecast-verification pairs we show that estimates of the probabilistic skill of both continuous and categorical forecasts with a fixed actual level of skill increase as a function of the variance explained by the trend over the hindcast period. The skill of categorical forecasts can be inflated even further when evaluated over a longer forecast period. We also show that this skill enhancement can be observed in the ECMWF extended-range ensemble prediction system. We demonstrate that the effects on the skill in an operational forecast setting are currently strongest in the tropics and mainly relevant for categorical forecasts. This highlights that care needs to be taken when evaluating forecasts that are subject to non-stationarity on time scales much longer than the forecast verification window, especially for categorical forecasts. The results presented in this study are by no means limited to the S2S time scale but have similar implications for the verification of seasonal to decadal predictions, where the existence of trends can further inflate forecast skill.</p>


2019 ◽  
Vol 17 (4) ◽  
pp. 56
Author(s):  
Jaime Enrique Lincovil ◽  
Chang Chiann

<p>Evaluating forecasts of risk measures, such as value–at–risk (VaR) and expected shortfall (ES), is an important process for financial institutions. Backtesting procedures were introduced to assess the efficiency of these forecasts. In this paper, we compare the empirical power of new classes of backtesting, for VaR and ES, from the statistical literature. Further, we employ these procedures to evaluate the efficiency of the forecasts generated by both the Historical Simulation method and two methods based on the Generalized Pareto Distribution. To evaluate VaR forecasts, the empirical power of the Geometric–VaR class of backtesting was, in general, higher than that of other tests in the simulated scenarios. This supports the advantages of using defined time periods and covariates in the test procedures. On the other hand, to evaluate ES forecasts, backtesting methods based on the conditional distribution of returns to the VaR performed well with large sample sizes. Additionally, we show that the method based on the generalized Pareto distribution using durations and covariates has optimal performance in forecasts of VaR and ES, according to backtesting.</p>


2019 ◽  
Vol 16 (4) ◽  
pp. 239-260 ◽  
Author(s):  
Robert L. Winkler ◽  
Yael Grushka-Cockayne ◽  
Kenneth C. Lichtendahl ◽  
Victor Richmond R. Jose

We explore some recent, and not so recent, developments concerning the use of probability forecasts and their combination in decision making. Despite these advances, challenges still exist. We expand on some important challenges influencing the “goodness” of combined probability forecasts such as miscalibration, dependence among forecasters, and selection of an appropriate evaluation measure while connecting the processes of aggregating and evaluating forecasts to decision making. Through three important applications from the domains of meteorology, economics, and political science, we illustrate state-of-the-art usage of probability forecasts: how they are combined, evaluated, and communicated to stakeholders. We expect to see greater use and aggregation of probability forecasts, especially given developments in statistical modeling, machine learning, and expert forecasting; the popularity of forecasting competitions; and the increased reporting of probabilities in the media. Our vision is that increased exposure to and improved visualizations of probability forecasts will enhance the public’s understanding of probabilities and how they can contribute to better decisions.


2019 ◽  
Vol 34 (3) ◽  
pp. 286-299 ◽  
Author(s):  
Carole Turley Voulgaris

As a discipline that concerns itself with the future, planning relies on forecasts to inform and guide action. With this reliance comes a concern that the best possible forecasts be produced. This review identifies three distinct ways in which forecasts may be evaluated (methodology, accuracy, and usefulness) and describes challenges associated with evaluating forecasts along any of these three dimensions. By way of example, this general discussion of forecasting is applied to the specific case of demand forecasts for transportation infrastructure, with an emphasis on transit infrastructure. There is a continuing need for planners to engage with interdisciplinary forecasting literature.


Author(s):  
Richard A. Mucci ◽  
Gregory D. Erhardt

Transit direct ridership models (DRMs) are commonly used both for descriptive analysis and for forecasting, but are rarely evaluated for their ability to predict beyond the estimation data set. This research does so, using two DRMs estimated for rail and bus ridership in San Francisco. The models are estimated from 2009 data, applied to predict 2016 conditions, and compared to actual 2016 ridership. Over this period in San Francisco, observed rail ridership increased by 9% whereas observed bus ridership decreased by 13%. The results show that the models predict 2016 ridership about as well as that for 2009. The rail model correctly predicts the direction of change, but underestimates the magnitude of change. The bus model predicts the direction of change incorrectly, with a predicted 2% increase. A series of sensitivity tests are conducted to better understand the factors driving the ridership changes. These tests produce reasonable rail sensitivities, but reveal that the bus model is too sensitive to frequency, potentially because of the difficulty of estimating the coefficient from cross-sectional data when high-frequency transit also occurs in high-density locations. As the travel forecasting community increases its focus on empirically evaluating forecasts beyond a base year, DRMs must be a part of that.


2018 ◽  
Vol 3 ◽  
pp. 3 ◽  
Author(s):  
Baran Yildiz ◽  
Jose I. Bilbao ◽  
Jonathon Dore ◽  
Alistair B. Sproul

Smart grid components such as smart home and battery energy management systems, high penetration of renewable energy systems, and demand response activities, require accurate electricity demand forecasts for the successful operation of the electricity distribution networks. For example, in order to optimize residential PV generation and electricity consumption and plan battery charge-discharge regimes by scheduling household appliances, forecasts need to target and be tailored to individual household electricity loads. The recent uptake of smart meters allows easier access to electricity readings at very fine resolutions; hence, it is possible to utilize this source of available data to create forecast models. In this paper, models which predominantly use smart meter data alongside with weather variables, or smart meter based models (SMBM), are implemented to forecast individual household loads. Well-known machine learning models such as artificial neural networks (ANN), support vector machines (SVM) and Least-Square SVM are implemented within the SMBM framework and their performance is compared. The analysed household stock consists of 14 households from the state of New South Wales, Australia, with at least a year worth of 5 min. resolution data. In order for the results to be comparable between different households, our study first investigates household load profiles according to their volatility and reveals the relationship between load standard deviation and forecast performance. The analysis extends previous research by evaluating forecasts over four different data resolution; 5, 15, 30 and 60 min, each resolution analysed for four different horizons; 1, 6, 12 and 24 h ahead. Both, data resolution and forecast horizon, proved to have significant impact on the forecast performance and the obtained results provide important insights for the operation of various smart grid applications. Finally, it is shown that the load profile of some households vary significantly across different days; as a result, providing a single model for the entire period may result in limited performance. By the use of a pre-clustering step, similar daily load profiles are grouped together according to their standard deviation, and instead of applying one SMBM for the entire data-set of a particular household, separate SMBMs are applied to each one of the clusters. This preliminary clustering step increases the complexity of the analysis however it results in significant improvements in forecast performance.


Sign in / Sign up

Export Citation Format

Share Document