Small-Sample Accuracy of Approximate Distributions of Functions of Observed Probabilities From t Tests

1991 ◽  
Vol 16 (4) ◽  
pp. 345-369
Author(s):  
Betsy Jane Becker

The observed probability p is the social scientist’s primary tool for evaluating the outcomes of statistical hypothesis tests. Functions of p s are used in tests of “combined significance,” meta-analytic summaries based on sample probability values. This study examines the nonnull asymptotic distributions of several functions of one-tailed sample probability values (from t tests). Normal approximations were based on the asymptotic distributions of z(p), the standard normal deviate associated with the one-sided p value; of ln(p), the natural logarithm of the probability value; and of several modifications of ln(p). Two additional approximations, based on variance-stabilizing transformations of ln(p) and z(p), were derived. Approximate cumulative distribution functions (cdfs) were compared to the computed exact cdf of the p associated with the one-sample t test. Approximations to the distribution of z(p) appeared quite accurate even for very small samples, while other approximations were inaccurate unless sample sizes or effect sizes were very large. Approximations based on variance-stabilizing transformations were not much more accurate than those based on ln(p) and z(p). Generalizations of the results are discussed, and implications for use of the approximations conclude the article.

2020 ◽  
Vol 67 (2) ◽  
pp. 114-151
Author(s):  
Daniel Kaszyński ◽  
Bogumił Kamiński ◽  
Bartosz Pankratz

The market risk management process includes the quantification of the risk connected with defined portfolios of assets and the diagnostics of the risk model. Value at Risk (VaR) is one of the most common market risk measures. Since the distributions of the daily P&L of financial instruments are unobservable, literature presents a broad range of backtests for VaR diagnostics. In this paper, we propose a new methodological approach to the assessment of the size of VaR backtests, and use it to evaluate the size of the most distinctive and popular backtests. The focus of the paper is directed towards the evaluation of the size of the backtests for small-sample cases – a typical situation faced during VaR backtesting in banking practice. The results indicate significant differences between tests in terms of the p-value distribution. In particular, frequency-based tests exhibit significantly greater discretisation effects than duration-based tests. This difference is especially apparent in the case of small samples. Our findings prove that from among the considered tests, the Kupiec TUFF and the Haas Discrete Weibull have the best properties. On the other hand, backtests which are very popular in banking practice, that is the Kupiec POF and Christoffersen’s Conditional Coverage, show significant discretisation, hence deviations from the theoretical size.


1984 ◽  
Vol 16 (4) ◽  
pp. 819-842 ◽  
Author(s):  
K. F. Turkman ◽  
A. M. Walker

Let {ε t, t = 1, 2, ···, n} be a sequence of mutually independent standard normal random variables. Let Xn(λ) and Yn(λ) be respectively the real and imaginary parts of exp iλ t, and let . It is shown that as n tends to∞, the distribution functions of the normalized maxima of the processes {Xn(λ)}, (Yn(λ)}, {In(λ)} over the interval λ∈ [0,π] each converge to the extremal distribution function exp [–e–x], —∞ < x <∞.It is also shown that these results can be extended to the case where {ε t} is a stationary Gaussian sequence with a moving-average representation.


1993 ◽  
Vol 9 (2) ◽  
pp. 263-282 ◽  
Author(s):  
In Choi

Using the asymptotic normality of the least-squares estimates for the autoregressive (AR) process with real, positive unit roots and at least one stable root, we consider the asymptotic distributions of the Wald and t ratio tests on AR coefficients. In addition, we propose a method of constructing confidence intervals for the sum of AR coefficients possibly in the presence of a unit root. Using simulation methods, we compare the finite-sample cumulative distributions of the t ratios for individual autoregressive coefficients with those of standard normal distributions, and investigate the finite-sample performance of our confidence intervals and t ratios. Our simulation results show that the t ratios for nonstationary processes converge to a standard normal distribution more slowly than those for stationary processes. Further, the confidence intervals are shown to work reasonably well in moderately large samples, but they display unsatisfactory performance at small sample sizes.


2017 ◽  
Vol 34 (5) ◽  
pp. 1065-1100 ◽  
Author(s):  
Offer Lieberman ◽  
Peter C.B. Phillips

Lieberman and Phillips (2017; LP) introduced a multivariate stochastic unit root (STUR) model, which allows for random, time varying local departures from a unit root (UR) model, where nonlinear least squares (NLLS) may be used for estimation and inference on the STUR coefficient. In a structural version of this model where the driver variables of the STUR coefficient are endogenous, the NLLS estimate of the STUR parameter is inconsistent, as are the corresponding estimates of the associated covariance parameters. This paper develops a nonlinear instrumental variable (NLIV) as well as GMM estimators of the STUR parameter which conveniently addresses endogeneity. We derive the asymptotic distributions of the NLIV and GMM estimators and establish consistency under similar orthogonality and relevance conditions to those used in the linear model. An overidentification test and its asymptotic distribution are also developed. The results enable inference about structural STUR models and a mechanism for testing the local STUR model against a simple UR null, which complements usual UR tests. Simulations reveal that the asymptotic distributions of the NLIV and GMM estimators of the STUR parameter as well as the test for overidentifying restrictions perform well in small samples and that the distribution of the NLIV estimator is heavily leptokurtic with a limit theory which has Cauchy-like tails. Comparisons of STUR coefficient and standard UR coefficient tests show that the one-sided UR test performs poorly against the one-sided STUR coefficient test both as the sample size and departures from the null rise. The results are applied to study the relationships between stock returns and bond spread changes.


2012 ◽  
Vol 49 (2) ◽  
pp. 159-175 ◽  
Author(s):  
Zofia Hanusz ◽  
Joanna Tarasińska ◽  
Zbigniew Osypiuk

Summary The kurtosis-based tests of Mardia and Srivastava for assessing multivariate normality (MVN) are considered. The asymptotic standard normal distribution of their test statistics, under normality, is often misused for too small samples. The purpose of this paper is to suggest mean-and-variance corrected versions of the Mardia and Srivastava test statistics. Simulation studies evaluating both the true sizes and the powers of original and corrected tests against selected alternatives are presented and compared to the size and the power of the Henze-Zirkler test. The proposed corrected statistics have empirical sizes closer to a nominal significance level than the original ones. It is also shown that the corrected versions of the tests can be more powerful than the original ones.


1984 ◽  
Vol 1 (19) ◽  
pp. 28 ◽  
Author(s):  
Christopher T. Carlson

Field measurements of narrow-band incident wind waves and the resulting run-up were made photographically at two different natural sand beaches along San Francisco Bay. The run-up spectra derived from the field-measured time series show some energy at the incident-wave peak frequency, with the predominant run-up spectral energy concentrated in frequency bands below the incident-wave peak frequency. Observations of the swash time series recorded at both beaches indicate that the low-frequency run-up is generated on the beach face by the interaction between the run-up and backwash during the swash cycle. Coherence analyses indicate that the offshore incident waves and run-up on the beach are not linearly correlated but that the run-up is correlated in the alongshore direction. The slopes of the log-log run-up spectra computed over the frequency band of the incident waves are all approximately -3. Statistical hypothesis tests were used to compare the empirical run-up cumulative distribution functions with both normal and Rayleigh distribution functions.


2010 ◽  
Vol 2010 ◽  
pp. 1-9 ◽  
Author(s):  
A. Wong

In introductory statistics texts, the power of the test of a one-sample mean when the variance is known is widely discussed. However, when the variance is unknown, the power of the Student's -test is seldom mentioned. In this note, a general methodology for obtaining inference concerning a scalar parameter of interest of any exponential family model is proposed. The method is then applied to the one-sample mean problem with unknown variance to obtain a 100% confidence interval for the power of the Student's -test that detects the difference . The calculations require only the density and the cumulative distribution functions of the standard normal distribution. In addition, the methodology presented can also be applied to determine the required sample size when the effect size and the power of a size test of mean are given.


1984 ◽  
Vol 16 (04) ◽  
pp. 819-842 ◽  
Author(s):  
K. F. Turkman ◽  
A. M. Walker

Let {ε t, t = 1, 2, ···, n} be a sequence of mutually independent standard normal random variables. Let X n(λ) and Y n(λ) be respectively the real and imaginary parts of exp iλ t, and let . It is shown that as n tends to∞, the distribution functions of the normalized maxima of the processes {X n(λ)}, (Y n(λ)}, {I n(λ)} over the interval λ∈ [0,π] each converge to the extremal distribution function exp [–e–x ], —∞ &lt; x &lt;∞. It is also shown that these results can be extended to the case where {ε t} is a stationary Gaussian sequence with a moving-average representation.


1994 ◽  
Vol 10 (3-4) ◽  
pp. 720-746 ◽  
Author(s):  
In Choi

This paper proposes residual-based tests for the null of level- and trend-stationarity, which are analogs of the LM test for an MA unit root. Asymptotic distributions of the tests are nonstandard, but they are expressed in a unified manner by expressing stochastic integrals. In addition, the tests are shown to be consistent. By expressing the distributions expressed as a function of a chi-square variable with one degree of freedom, the exact limiting probability density and cumulative distribution functions are obtained, and the exact limiting cumulative distribution functions are tabulated. Finite sample performance of the proposed tests is studied by simulation. The tests display stable size when the lag truncation number for the long-run variance estimation is chosen appropriately. But the power of the tests is generally not high at selected sample sizes. The test for the null of trend-stationarity is applied to the U.S. macroeconomic time series along with the Phillips-Perron Z(⋯) test. For some monthly and annual series, the two tests provide consistent inferential results. But for most series, the two contradictory nulls of trend-stationarity and a unit root cannot be rejected at the conventional significance levels.


2003 ◽  
Vol 27 (1) ◽  
pp. 27-51 ◽  
Author(s):  
Hariharan Swaminathan ◽  
Ronald K. Hambleton ◽  
Stephen G. Sireci ◽  
Dehui Xing ◽  
Saba M. Rizavi

Large item banks with properly calibrated test items are essential for ensuring the validity of computer-based tests. At the same time, item calibrations with small samples are desirable to minimize the amount of pretesting and limit item exposure. Bayesian estimation procedures show considerable promise with small examinee samples. The purposes of the study were (a) to examine how prior information for Bayesian item parameter estimation can be specified and (b) to investigate the relationship between sample size and the specification of prior information on the accuracy of item parameter estimates. The results of the simulation study were clear: Estimation of IRT model item parameters can be improved considerably. Improvements in the one-parameter model were modest; considerable improvements with the two- and three-parameter models were observed. Both the study of different forms of priors and ways to improve the judgmental data used in forming the priors appear to be promising directions for future research.


Sign in / Sign up

Export Citation Format

Share Document