On the number of bins in a rank histogram

A Marginal Adjustment Rank Histogram Filter for Non-Gaussian Ensemble Data Assimilation

Monthly Weather Review ◽

10.1175/mwr-d-19-0307.1 ◽

2020 ◽

Author(s):

Jeffrey L. Anderson

Keyword(s):

Ensemble Kalman Filter ◽

Ensemble Methods ◽

State Variables ◽

Histogram Method ◽

Prior Distributions ◽

Bounded State ◽

Rank Histogram ◽

Non Gaussian ◽

Atmospheric Tracer ◽

Filter Algorithms

An extension to standard ensemble Kalman filter algorithms that can improve performance for non-Gaussian prior distributions, non-Gaussian likelihoods, and bounded state variables is described. The algorithm exploits the capability of the rank histogram filter (RHF) to represent arbitrary prior distributions for observed variables. The rank histogram algorithm can be applied directly to state variables to produce posterior marginal ensembles without the need for regression that is part of standard ensemble filters. These marginals are used to adjust the marginals obtained from a standard ensemble filter that uses regression to update state variables. The final posterior ensemble is obtained by doing an ordered replacement of the posterior marginal ensemble values from a standard ensemble filter with the values obtained from the rank histogram method applied directly to state variables; the algorithm is referred to as the Marginal Adjustment Rank Histogram Filter (MARHF). Applications to idealized bivariate problems and low-order dynamical systems show that the MARHF can produce better results than standard ensemble methods for priors that are non-Gaussian. Like the original RHF, the MARHF can also make use of arbitrary non-Gaussian observation likelihoods. The MARHF also has advantages for problems with bounded state variables, for instance the concentration of an atmospheric tracer. Bounds can be automatically respected in the posterior ensembles. With an efficient implementation of the MARHF, the additional cost has better scaling than the standard RHF.

Download Full-text

Using Bayesian Model Averaging (BMA) to calibrate probabilistic surface temperature forecasts over Iran

Annales Geophysicae ◽

10.5194/angeo-29-1295-2011 ◽

2011 ◽

Vol 29 (7) ◽

pp. 1295-1303 ◽

Cited By ~ 2

Author(s):

I. Soltanzadeh ◽

M. Azadi ◽

G. A. Vakili

Keyword(s):

Boundary Conditions ◽

Bayesian Model ◽

Bayesian Model Averaging ◽

Model Averaging ◽

Training Sample ◽

Limited Area ◽

Probabilistic Forecasts ◽

Rank Histogram ◽

Initial And Boundary Conditions ◽

Environmental Prediction

Abstract. Using Bayesian Model Averaging (BMA), an attempt was made to obtain calibrated probabilistic numerical forecasts of 2-m temperature over Iran. The ensemble employs three limited area models (WRF, MM5 and HRM), with WRF used with five different configurations. Initial and boundary conditions for MM5 and WRF are obtained from the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS) and for HRM the initial and boundary conditions come from analysis of Global Model Europe (GME) of the German Weather Service. The resulting ensemble of seven members was run for a period of 6 months (from December 2008 to May 2009) over Iran. The 48-h raw ensemble outputs were calibrated using BMA technique for 120 days using a 40 days training sample of forecasts and relative verification data. The calibrated probabilistic forecasts were assessed using rank histogram and attribute diagrams. Results showed that application of BMA improved the reliability of the raw ensemble. Using the weighted ensemble mean forecast as a deterministic forecast it was found that the deterministic-style BMA forecasts performed usually better than the best member's deterministic forecast.

Download Full-text

On reliability analysis of multi-categorical forecasts

Nonlinear Processes in Geophysics ◽

10.5194/npg-15-661-2008 ◽

2008 ◽

Vol 15 (4) ◽

pp. 661-673 ◽

Cited By ~ 13

Author(s):

J. Bröcker

Keyword(s):

Reliability Analysis ◽

Goodness Of Fit ◽

Necessary Condition ◽

Lead Times ◽

Temperature Anomalies ◽

Probability Score ◽

Probabilistic Forecasts ◽

Medium Range ◽

Rank Histogram ◽

Range Forecasts

Abstract. Reliability analysis of probabilistic forecasts, in particular through the rank histogram or Talagrand diagram, is revisited. Two shortcomings are pointed out: Firstly, a uniform rank histogram is but a necessary condition for reliability. Secondly, if the forecast is assumed to be reliable, an indication is needed how far a histogram is expected to deviate from uniformity merely due to randomness. Concerning the first shortcoming, it is suggested that forecasts be grouped or stratified along suitable criteria, and that reliability is analyzed individually for each forecast stratum. A reliable forecast should have uniform histograms for all individual forecast strata, not only for all forecasts as a whole. As to the second shortcoming, instead of the observed frequencies, the probability of the observed frequency is plotted, providing and indication of the likelihood of the result under the hypothesis that the forecast is reliable. Furthermore, a Goodness-Of-Fit statistic is discussed which is essentially the reliability term of the Ignorance score. The discussed tools are applied to medium range forecasts for 2 m-temperature anomalies at several locations and lead times. The forecasts are stratified along the expected ranked probability score. Those forecasts which feature a high expected score turn out to be particularly unreliable.

Download Full-text

Sample Stratification in Verification of Ensemble Forecasts of Continuous Scalar Variables: Potential Benefits and Pitfalls

Monthly Weather Review ◽

10.1175/mwr-d-16-0487.1 ◽

2017 ◽

Vol 145 (9) ◽

pp. 3529-3544 ◽

Cited By ~ 10

Author(s):

Joseph Bellier ◽

Isabella Zin ◽

Guillaume Bontron

Keyword(s):

General Framework ◽

Graphical Representations ◽

Ensemble Forecasts ◽

Probability Score ◽

Numerical Example ◽

Continuous Ranked Probability Score ◽

Potential Benefits ◽

Rank Histogram ◽

Areal Precipitation ◽

Definition Of

In the verification field, stratification is the process of dividing the sample of forecast–observation pairs into quasi-homogeneous subsets, in order to learn more on how forecasts behave under specific conditions. A general framework for stratification is presented for the case of ensemble forecasts of continuous scalar variables. Distinction is made between forecast-based, observation-based, and external-based stratification, depending on the criterion on which the sample is stratified. The formalism is applied to two widely used verification measures: the continuous ranked probability score (CRPS) and the rank histogram. For both, new graphical representations that synthesize the added information are proposed. Based on the definition of calibration, it is shown that the rank histogram should be used within a forecast-based stratification, while an observation-based stratification leads to significantly nonflat histograms for calibrated forecasts. Nevertheless, as previous studies have warned, statistical artifacts created by a forecast-based stratification may still occur, thus a graphical test to detect them is suggested. To illustrate potential insights about forecast behavior that can be gained from stratification, a numerical example with two different datasets of mean areal precipitation forecasts is presented.

Download Full-text

A non-Gaussian analysis scheme using rank histograms for ensemble data assimilation

Nonlinear Processes in Geophysics ◽

10.5194/npg-21-869-2014 ◽

2014 ◽

Vol 21 (4) ◽

pp. 869-885 ◽

Cited By ~ 12

Author(s):

S. Metref ◽

E. Cosme ◽

C. Snyder ◽

P. Brasseur

Keyword(s):

Data Assimilation ◽

North Atlantic Ocean ◽

Biogeochemical Model ◽

Bayes Rule ◽

The North ◽

Analysis Scheme ◽

Rank Histogram ◽

Non Gaussian ◽

Low Dimensional ◽

Linear Unbiased Estimate

Abstract. One challenge of geophysical data assimilation is to address the issue of non-Gaussianities in the distributions of the physical variables ensuing, in many cases, from nonlinear dynamical models. Non-Gaussian ensemble analysis methods fall into two categories, those remapping the ensemble particles by approximating the best linear unbiased estimate, for example, the ensemble Kalman filter (EnKF), and those resampling the particles by directly applying Bayes' rule, like particle filters. In this article, it is suggested that the most common remapping methods can only handle weakly non-Gaussian distributions, while the others suffer from sampling issues. In between those two categories, a new remapping method directly applying Bayes' rule, the multivariate rank histogram filter (MRHF), is introduced as an extension of the rank histogram filter (RHF) first introduced by Anderson (2010). Its performance is evaluated and compared with several data assimilation methods, on different levels of non-Gaussianity with the Lorenz 63 model. The method's behavior is then illustrated on a simple density estimation problem using ensemble simulations from a coupled physical–biogeochemical model of the North Atlantic ocean. The MRHF performs well with low-dimensional systems in strongly non-Gaussian regimes.

Download Full-text

Indices of Rank Histogram Flatness and Their Sampling Properties

Monthly Weather Review ◽

10.1175/mwr-d-18-0369.1 ◽

2019 ◽

Vol 147 (2) ◽

pp. 763-769 ◽

Cited By ~ 3

Author(s):

D. S. Wilks

Keyword(s):

Null Hypothesis ◽

Statistical Power ◽

Small Sample ◽

Sample Sizes ◽

Sampling Distributions ◽

Power Of Tests ◽

Rank Histogram ◽

Small Sample Sizes ◽

Formal Hypothesis Testing ◽

Two Alternatives

Abstract Quantitative evaluation of the flatness of the verification rank histogram can be approached through formal hypothesis testing. Traditionally, the familiar χ2 test has been used for this purpose. Recently, two alternatives—the reliability index (RI) and an entropy statistic (Ω)—have been suggested in the literature. This paper presents approximations to the sampling distributions of these latter two rank histogram flatness metrics, and compares the statistical power of tests based on the three statistics, in a controlled setting. The χ2 test is generally most powerful (i.e., most sensitive to violations of the null hypothesis of rank uniformity), although for overdispersed ensembles and small sample sizes, the test based on the entropy statistic Ω is more powerful. The RI-based test is preferred only for unbiased forecasts with small ensembles and very small sample sizes.

Download Full-text

On Evaluation of Ensemble Forecast Calibration Using the Concept of Data Depth

Monthly Weather Review ◽

10.1175/mwr-d-16-0351.1 ◽

2017 ◽

Vol 145 (5) ◽

pp. 1679-1690 ◽

Cited By ~ 3

Author(s):

Mahsa Mirzargar ◽

Jeffrey L. Anderson

Keyword(s):

Sample Size ◽

Statistical Inference ◽

Ensemble Forecast ◽

Data Depth ◽

Multivariate Statistical ◽

Rank Statistics ◽

Simplicial Depth ◽

Rank Histogram ◽

Vector Valued

Abstract Various generalizations of the univariate rank histogram have been proposed to inspect the reliability of an ensemble forecast or analysis in multidimensional spaces. Multivariate rank histograms provide insightful information about the misspecification of genuinely multivariate features such as the correlation between various variables in a multivariate ensemble. However, the interpretation of patterns in a multivariate rank histogram should be handled with care. The purpose of this paper is to focus on multivariate rank histograms designed based on the concept of data depth and outline some important considerations that should be accounted for when using such multivariate rank histograms. To generate correct multivariate rank histograms using the concept of data depth, the datatype of the ensemble should be taken into account to define a proper preranking function. This paper demonstrates how and why some preranking functions might not be suitable for multivariate or vector-valued ensembles and proposes preranking functions based on the concept of simplicial depth that are applicable to both multivariate points and vector-valued ensembles. In addition, there exists an inherent identifiability issue associated with center-outward preranking functions used to generate multivariate rank histograms. This problem can be alleviated by complementing the multivariate rank histogram with other well-known multivariate statistical inference tools based on rank statistics such as the depth-versus-depth (DD) plot. Using a synthetic example, it is shown that the DD plot is less sensitive to sample size compared to multivariate rank histograms.

Download Full-text

On the Effect of Correlations on Rank Histograms: Reliability of Temperature and Wind Speed Forecasts from Finescale Ensemble Reforecasts

Monthly Weather Review ◽

10.1175/2010mwr3129.1 ◽

2011 ◽

Vol 139 (1) ◽

pp. 295-310 ◽

Cited By ~ 27

Author(s):

Caren Marzban ◽

Ranran Wang ◽

Fanyou Kong ◽

Stephen Leyton

Keyword(s):

Wind Speed ◽

Temporal Correlation ◽

Ensemble Member ◽

Ensemble Forecasts ◽

Misleading Information ◽

Sampling Variability ◽

Visual Tool ◽

Wide Range ◽

Rank Histogram ◽

The Difference

Abstract The rank histogram (RH) is a visual tool for assessing the reliability of ensemble forecasts (i.e., the degree to which the forecasts and the observations have the same distribution). But it is already known that in certain situations it conveys misleading information. Here, it is shown that a temporal correlation can lead to a misleading RH, but such a correlation contributes only to the sampling variability of the RH, and so it is accounted for by producing a RH that explicitly displays sampling variability. A simulation is employed to show that the variance within each ensemble member (i.e., climatological variance), the correlation between ensemble members, and the correlation between the observations and the forecasts, all have a confounding effect on the RH, making it difficult to use the RH for assessing the climatological component of forecast reliability. It is proposed that a “residual” quantile–quantile plot (denoted R-Q-Q plot) is better suited than the RH for assessing the climatological component of forecast reliability. Then, the RH and R-Q-Q plots for temperature and wind speed forecasts at 90 stations across the continental United States are computed. A wide range of forecast reliability is noted. For some stations, the nonreliability of the forecasts can be attributed to bias and/or under-or overclimatological dispersion. For others, the difference between the distributions can be traced to lighter or heavier tails in the distributions, while for other stations the distributions of the forecasts and the observations appear to be completely different. A spatial signature is also noted and discussed briefly.

Download Full-text

On the Reliability of the Rank Histogram

Monthly Weather Review ◽

10.1175/2010mwr3446.1 ◽

2011 ◽

Vol 139 (1) ◽

pp. 311-316 ◽

Cited By ~ 24

Author(s):

D. S. Wilks

Keyword(s):

Joint Distribution ◽

Gaussian Model ◽

Ensemble Forecasts ◽

Incorrect Conclusion ◽

Multivariate Gaussian ◽

Rank Histogram ◽

Multivariate Gaussian Model

Abstract Ensemble consistency is a name for the condition that an observation being forecast by a dynamical ensemble is statistically indistinguishable from the ensemble members. This statistical indistinguishability condition is meaningful only in a multivariate sense. That is, it pertains to the joint distribution of the ensemble members and the observation. The rank histogram has been designed to assess overall ensemble consistency, but mistakenly employing it to assess only restricted aspects of this joint distribution (e.g., the climatological distribution) leads to the incorrect conclusion that the verification rank histogram is not a useful diagnostic for good behavior of ensemble forecasts. The potential confusion is analyzed in the context of an idealized multivariate Gaussian model of forecast ensembles and their corresponding observations, and it is shown that the rank histogram does correctly assess the consistency of forecast ensembles.

Download Full-text

Alternatives to the Chi-Square Test for Evaluating Rank Histograms from Ensemble Forecasts

Weather and Forecasting ◽

10.1175/waf884.1 ◽

2005 ◽

Vol 20 (5) ◽

pp. 789-795 ◽

Cited By ~ 34

Author(s):

Kimberly L. Elmore

Keyword(s):

Goodness Of Fit ◽

Small Sample ◽

Chi Square ◽

Ensemble Forecasts ◽

Statistical Fluctuations ◽

Rank Histogram ◽

Anderson Darling ◽

Discrete Uniform ◽

Small Sample Sizes ◽

Von Mises

Abstract Rank histograms are a commonly used tool for evaluating an ensemble forecasting system’s performance. Because the sample size is finite, the rank histogram is subject to statistical fluctuations, so a goodness-of-fit (GOF) test is employed to determine if the rank histogram is uniform to within some statistical certainty. Most often, the χ2 test is used to test whether the rank histogram is indistinguishable from a discrete uniform distribution. However, the χ2 test is insensitive to order and so suffers from troubling deficiencies that may render it unsuitable for rank histogram evaluation. As shown by examples in this paper, more powerful tests, suitable for small sample sizes, and very sensitive to the particular deficiencies that appear in rank histograms are available from the order-dependent Cramér–von Mises family of statistics, in particular, the Watson and Anderson–Darling statistics.

Download Full-text