Probabilities for the kolmogorov-smirnov one-sample test statistic

A bootstrap-based hypothesis test of the goodness-of-fit for the marginal distribution of a time series is presented. Two metrics, the empirical survival Jensen–Shannon divergence (ESJS) and the Kolmogorov–Smirnov two-sample test statistic (KS2), are compared on four data sets—three stablecoin time series and a Bitcoin time series. We demonstrate that, after applying first-order differencing, all the data sets fit heavy-tailed α-stable distributions with 1<α<2 at the 95% confidence level. Moreover, ESJS is more powerful than KS2 on these data sets, since the widths of the derived confidence intervals for KS2 are, proportionately, much larger than those of ESJS.

Download Full-text

A sketch for the KS test for Big Data

10.5753/kdmile.2021.17455 ◽

2021 ◽

Author(s):

Thalis D. Galeno ◽

João Gama ◽

Douglas O. Cardoso

Keyword(s):

Big Data ◽

Goodness Of Fit ◽

Absolute Error ◽

Test Statistic ◽

Input Stream ◽

Reference Distribution ◽

Sample Test ◽

Kolmogorov Smirnov ◽

The One ◽

Smirnov Test

Motivated by the challenges of Big Data, this paper presents an approximative algorithm to assess the Kolmogorov-Smirnov test. This goodness of fit statistical test is extensively used because it is non-parametric. This work focuses on the one-sample test, which considers the hypothesis that a given univariate sample follows some reference distribution. The method allows to evaluate the departure from such a distribution of a input stream, being space and time efficient. We show the accuracy of our algorithm by making several experiments in different scenarios: varying reference distribution and its parameters, sample size, and available memory. The performance of rival methods, some of which are considered the state-of-the-art, were compared. It is demonstrated that our algorithm is superior in most of the cases, considering the absolute error of the test statistic.

Download Full-text

An adaptive algorithm for clustering cumulative probability distribution functions using the Kolmogorov–Smirnov two-sample test

Expert Systems with Applications ◽

10.1016/j.eswa.2014.12.027 ◽

2015 ◽

Vol 42 (8) ◽

pp. 4016-4021 ◽

Cited By ~ 12

Author(s):

Llanos Mora-López ◽

Juan Mora

Keyword(s):

Probability Distribution ◽

Adaptive Algorithm ◽

Distribution Functions ◽

Cumulative Probability ◽

Probability Distribution Functions ◽

Cumulative Probability Distribution ◽

Sample Test ◽

Kolmogorov Smirnov

Download Full-text

Muon radiography to visualise individual fuel rods in sealed casks

EPJ Nuclear Sciences & Technologies ◽

10.1051/epjn/2021010 ◽

2021 ◽

Vol 7 ◽

pp. 12

Author(s):

Thomas Braunroth ◽

Nadine Berner ◽

Florian Rowold ◽

Marc Péridis ◽

Maik Stuke

Keyword(s):

Spent Nuclear Fuel ◽

Cosmic Ray ◽

Scattering Data ◽

Pressurized Water Reactor ◽

Test Statistic ◽

Pressurized Water ◽

Fuel Rods ◽

Acceptance Criterion ◽

Muon Radiography ◽

Kolmogorov Smirnov

Cosmic-ray muons can be used for the non-destructive imaging of spent nuclear fuel in sealed dry storage casks. The scattering data of the muons after traversing provides information on the thereby penetrated materials. Based on these properties, we investigate and discuss the theoretical feasibility of detecting single missing fuel rods in a sealed cask for the first time. We perform simulations of a vertically standing generic cask model loaded with fuel assemblies from a pressurized water reactor and muon detectors placed above and below the cask. By analysing the scattering angles and applying a significance ratio based on the Kolmogorov-Smirnov test statistic we conclude that missing rods can be reliably identified in a reasonable measuring time period depending on their position in the assembly and cask, and on the angular acceptance criterion of the primary, incoming muons.

Download Full-text

Detecting Nonlinear Associations, Plus Comments on Testing Hypotheses About the Correlation Coefficient

Journal of Educational and Behavioral Statistics ◽

10.3102/10769986026001073 ◽

2001 ◽

Vol 26 (1) ◽

pp. 73-83 ◽

Cited By ~ 7

Author(s):

Rand R. Wilcox

Keyword(s):

T Test ◽

Test Statistic ◽

The Social ◽

Zero Correlation ◽

Common Strategy ◽

Kolmogorov Smirnov ◽

Nonlinear Associations ◽

Student’S T ◽

Von Mises ◽

Student’S T Test

Let (Yi,Xi ), i = 1, . . . , n, be a random sample from some p + 1 variate distribution where Xi is a vector of length p. In the social sciences, the most common strategy for detecting an association between Y and the marginal distributions is to test the hypothesis that the corresponding correlations are zero using a standard Student’s t test. There are two practical problems with this strategy. First, for reasons described in the article, there are situations where the correlation between two random variables is zero, but Student’s t test is not even asymptotically correct. In fact, the probability of rejecting can approach one as the sample size gets large, even though the hypothesis of a zero correlation is true. Of course, one can also apply standard methods based on a linear regression model and the least squares estimator, but the same practical problems arise. Second, Student’s t test can miss nonlinear associations. This latter problem is the main motivation for this article. Results of a former study suggest an approach that avoids both of the difficulties just described. Based on simulations, it is found that the Cramér-von Mises form of the test statistic is generally better than the Kolmogorov-Smirnov form. Situations arise where this method has less power than Student’s t test, but this is due in part to t test’s use of an incorrect estimate of the standard error.

Download Full-text

Weighted Kolmogorov Smirnov testing: an alternative for Gene Set Enrichment Analysis

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2014-0077 ◽

2015 ◽

Vol 14 (3) ◽

Cited By ~ 13

Author(s):

Konstantina Charmpi ◽

Bernard Ycart

Keyword(s):

Weight Function ◽

Null Hypothesis ◽

Computing Time ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Test Statistic ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets ◽

Kolmogorov Smirnov

AbstractGene Set Enrichment Analysis (GSEA) is a basic tool for genomic data treatment. Its test statistic is based on a cumulated weight function, and its distribution under the null hypothesis is evaluated by Monte-Carlo simulation. Here, it is proposed to subtract to the cumulated weight function its asymptotic expectation, then scale it. Under the null hypothesis, the convergence in distribution of the new test statistic is proved, using the theory of empirical processes. The limiting distribution needs to be computed only once, and can then be used for many different gene sets. This results in large savings in computing time. The test defined in this way has been called Weighted Kolmogorov Smirnov (WKS) test. Using expression data from the GEO repository, tested against the MSig Database C2, a comparison between the classical GSEA test and the new procedure has been conducted. Our conclusion is that, beyond its mathematical and algorithmic advantages, the WKS test could be more informative in many cases, than the classical GSEA test.

Download Full-text

Comparison of Splitting Methods on Survival Tree

The International Journal of Biostatistics ◽

10.1515/ijb-2014-0029 ◽

2015 ◽

Vol 11 (1) ◽

Cited By ~ 4

Author(s):

Asanao Shimokawa ◽

Yohei Kawasaki ◽

Etsuo Miyaoka

Keyword(s):

Regression Tree ◽

Splitting Methods ◽

Classification And Regression Tree ◽

Rank Test ◽

Test Statistic ◽

Survival Trees ◽

Martingale Residual ◽

Log Likelihood ◽

Sample Test ◽

Deviance Residual

AbstractWe compare splitting methods for constructing survival trees that are used as a model of survival time based on covariates. A number of splitting criteria on the classification and regression tree (CART) have been proposed by various authors, and we compare nine criteria through simulations. Comparative studies have been restricted to criteria that suppose the survival model for each terminal node in the final tree as a non-parametric model. As the main results, the criteria using the exponential log-likelihood loss, log-rank test statistics, the deviance residual under the proportional hazard model, or square error of martingale residual are recommended when it appears that the data have constant hazard with the passage of time. On the other hand, when the data are thought to have decreasing hazard with passage of time, the criterion using the two-sample test statistic, or square error of deviance residual would be optimal. Moreover, when the data are thought to have increasing hazard with the passage of time, the criterion using the exponential log-likelihood loss, or impurity that combines observed times and the proportion of censored observations would be the best. We also present the results of an actual medical research to show the utility of survival trees.

Download Full-text