An Exhaustive Power Comparison of Normality Tests

Jurgita Arnastauskaitė; Tomas Ruzgas; Mindaugas Bražėnas

doi:10.3390/math9070788

An Exhaustive Power Comparison of Normality Tests

Mathematics ◽

10.3390/math9070788 ◽

2021 ◽

Vol 9 (7) ◽

pp. 788

Author(s):

Jurgita Arnastauskaitė ◽

Tomas Ruzgas ◽

Mindaugas Bražėnas

Keyword(s):

Sample Size ◽

Kernel Function ◽

Comparative Research ◽

Goodness Of Fit ◽

Statistical Tests ◽

Power Comparison ◽

Goodness Of Fit Test ◽

Data Set ◽

Research Results ◽

Goodness Of Fit Hypothesis

A goodness-of-fit test is a frequently used modern statistics tool. However, it is still unclear what the most reliable approach is to check assumptions about data set normality. A particular data set (especially with a small number of observations) only partly describes the process, which leaves many options for the interpretation of its true distribution. As a consequence, many goodness-of-fit statistical tests have been developed, the power of which depends on particular circumstances (i.e., sample size, outlets, etc.). With the aim of developing a more universal goodness-of-fit test, we propose an approach based on an N-metric with our chosen kernel function. To compare the power of 40 normality tests, the goodness-of-fit hypothesis was tested for 15 data distributions with 6 different sample sizes. Based on exhaustive comparative research results, we recommend the use of our test for samples of size .

Download Full-text

A Simple Note on the Determination of the Optimal Sample Size in χ2-goodness of Fit-test (Comparison of Observed Frequencies with Theoretically Expected Frequencies)

Biometrical Journal ◽

10.1002/bimj.4710200611 ◽

1978 ◽

Vol 20 (6) ◽

pp. 623-624 ◽

Cited By ~ 1

Author(s):

M. H. Hühn

Keyword(s):

Sample Size ◽

Goodness Of Fit ◽

Goodness Of Fit Test ◽

Optimal Sample Size ◽

Optimal Sample

Download Full-text

The Effect of Sample Size on Bivariate Rainfall Frequency Analysis of Extreme Precipitation

Proceedings ◽

10.3390/ecws-3-05815 ◽

2018 ◽

Vol 7 (1) ◽

pp. 19 ◽

Cited By ~ 1

Author(s):

Nikoletta Stamatatou ◽

Lampros Vasiliades ◽

Athanasios Loukas

Keyword(s):

Sample Size ◽

Frequency Analysis ◽

Extreme Precipitation ◽

Goodness Of Fit ◽

Return Periods ◽

Goodness Of Fit Test ◽

Maximum Rainfall ◽

Rainfall Frequency ◽

Rainfall Frequency Analysis ◽

Pseudo Likelihood Estimation

The objective of this study is to compare univariate and joint bivariate return periods of extreme precipitation that all rely on different probability concepts in selected meteorological stations in Cyprus. Pairs of maximum rainfall depths with corresponding durations are estimated and compared using annual maximum series (AMS) for the complete period of the analysis and 30-year subsets for selected data periods. Marginal distributions of extreme precipitation are examined and used for the estimation of typical design periods. The dependence between extreme rainfall and duration is then assessed by an exploratory data analysis using K-plots and Chi-plots and the consistency of their relationship is quantified by Kendall’s correlation coefficient. Copulas from Archimedean, Elliptical, and Extreme Value families are fitted using a pseudo-likelihood estimation method, evaluated according to the corrected Akaike Information Criterion and verified using both graphical approaches and a goodness-of-fit test based on the Cramér-von Mises statistic. The selected copula functions and the corresponding conditional and joint return periods are calculated and the results are compared with the marginal univariate estimations of each variable. Results highlight the effect of sample size on univariate and bivariate rainfall frequency analysis for hydraulic engineering design practices.

Download Full-text

A Cautionary Note on the Use of the Kolmogorov–Smirnov Test for Normality

Monthly Weather Review ◽

10.1175/mwr3326.1 ◽

2007 ◽

Vol 135 (3) ◽

pp. 1151-1157 ◽

Cited By ~ 75

Author(s):

Dag J. Steinskog ◽

Dag B. Tjøstheim ◽

Nils G. Kvamstø

Keyword(s):

Goodness Of Fit ◽

Small Sample ◽

Power Comparison ◽

Goodness Of Fit Test ◽

Kolmogorov Smirnov ◽

The Mean ◽

Small Sample Problem ◽

Tests For Normality ◽

Test For Normality ◽

Smirnov Test

Abstract The Kolmogorov–Smirnov goodness-of-fit test is used in many applications for testing normality in climate research. This note shows that the test usually leads to systematic and drastic errors. When the mean and the standard deviation are estimated, it is much too conservative in the sense that its p values are strongly biased upward. One may think that this is a small sample problem, but it is not. There is a correction of the Kolmogorov–Smirnov test by Lilliefors, which is in fact sometimes confused with the original Kolmogorov–Smirnov test. Both the Jarque–Bera and the Shapiro–Wilk tests for normality are good alternatives to the Kolmogorov–Smirnov test. A power comparison of eight different tests has been undertaken, favoring the Jarque–Bera and the Shapiro–Wilk tests. The Jarque–Bera and the Kolmogorov–Smirnov tests are also applied to a monthly mean dataset of geopotential height at 500 hPa. The two tests give very different results and illustrate the danger of using the Kolmogorov–Smirnov test.

Download Full-text

Modelling Sparse Interaction Matrices: Interward Migration in Hereford and Worcester, and the Underdispersion Problem

Environment and Planning A Economy and Space ◽

10.1068/a251201 ◽

1993 ◽

Vol 25 (8) ◽

pp. 1201-1209 ◽

Cited By ~ 12

Author(s):

P J Boyle ◽

R Flowerdew

Keyword(s):

Degrees Of Freedom ◽

Goodness Of Fit ◽

Ordinary Least Squares ◽

Variable Number ◽

Data Matrix ◽

Least Squares Regression ◽

Goodness Of Fit Test ◽

Data Set ◽

Number Of Zeros ◽

Ward Level

Recently, it has been argued that migration models in which ordinary least squares regression is used are inappropriate, because the dependent variable (number of migrants) is a count. Instead, a Poisson regression approach can be adopted. Goodness of fit can be evaluated by using a deviance (log-likelihood) or X2 statistic, whose significance can be compared with a χ2 distribution with appropriate degrees of freedom. In this paper, such an approach is used to model ward-level migration flows within the county of Hereford and Worcester. However, it is shown that for this exceedingly sparse data set the deviance figures attained are very low, suggesting that there may be a problem of underdispersion. This is in contrast to the overdispersion which has been identified as a common problem in Poisson models. The low deviance figures arise from the large number of zeros and small flows in the data matrix, which invalidate the usual χ2 goodness-of-fit test. A simulation approach to the assessment of model goodness of fit is suggested, and the results from applying it to the Hereford and Worcester data set are described.

Download Full-text

Clinical Application of Mathematical Long Bone Ratios to Calculate Appropriate Donor Limb Lengths in Bilateral Upper Limb Transplantation

Hand ◽

10.1177/1558944717753672 ◽

2018 ◽

Vol 14 (4) ◽

pp. 523-529

Author(s):

Justyn Lutfy ◽

Alexis Pietak ◽

Shaun D. Mendenhall ◽

Michael W. Neumeister

Keyword(s):

Clinical Application ◽

Upper Limb ◽

Goodness Of Fit ◽

Linear Regression Analysis ◽

Limb Length ◽

Long Bone ◽

Multivariate Linear Regression Analysis ◽

Goodness Of Fit Test ◽

Chi Square ◽

Data Set

Background: Limited methods exist to aid in deciding the appropriate donor limb lengths in bilateral upper limb amputees qualifying for vascularized composite allotransplantation. We hypothesized mathematical equations could be created using long bone length ratios, and applied to radiographs, to approximate the patient’s limb length prior to amputation. Methods: A data set of 30 skeletons’ unilateral upper limb long bones measured using osteometric board and calipers was used. Anatomic segment ratios were calculated based on humerus length after multivariate linear regression analysis. For clinical application testing, 5 cadavers’ upper limbs were radiographed. Radiographic bone lengths were then measured along the long axis of each long bone. These measured radiographic lengths were then compared with the predicted bone lengths, generated from the skeleton data set ratios, for each cadaver. Results: The chi-square goodness-of-fit test showed excellent fit ( P < .01) between the predicted and radiographically measured lengths for the 5 cadavers, and interobserver measurements showed no statistical difference. Depending on the cadaver, percent error in total limb length predicted to measure ranged from 0.9% to 2.7%. The variables to multiply an individual humerus length to calculate a given anatomic segment thus proved to be effective. Conclusions: If a bilateral upper limb amputee has 1 intact humerus, ratios to the humerus length can be reliably applied to calculate the preamputation limb length based on the patient’s radiographic humerus length. These formulas are indicated for finding the appropriate limb lengths, and smaller anatomic segments, for donor-recipient matching in upper limb transplantation.

Download Full-text

On HÜHN's Notion of Optimal Sample Size in the χ2 Goodness of Fit Test

Biometrical Journal ◽

10.1002/bimj.4710230710 ◽

1981 ◽

Vol 23 (7) ◽

pp. 705-707

Author(s):

J. C. W. Rayner ◽

D. J. Best

Keyword(s):

Sample Size ◽

Goodness Of Fit ◽

Goodness Of Fit Test ◽

Optimal Sample Size ◽

Optimal Sample

Download Full-text

Visual Assessment vs. Statistical Goodness of Fit Tests for Identifying Parent Population

Proceedings of the Human Factors Society Annual Meeting ◽

10.1177/154193128803200701 ◽

1988 ◽

Vol 32 (7) ◽

pp. 460-464

Author(s):

Mari Berry ◽

Brian Peacock ◽

Bobbie Foote ◽

Lawrence Leemis

Keyword(s):

Goodness Of Fit ◽

Statistical Tests ◽

Visual Assessment ◽

Discrete Distributions ◽

Human Observer ◽

Chi Square ◽

Data Set ◽

Goodness Of Fit Tests ◽

Parent Distribution ◽

Chi Square Test

Statistical tests are used to identify the parent distribution corresponding to a data set. A human observer looking at a histogram can also identify a probability distribution that models the parent distribution. The accuracy of a human observer was compared to the chi-square test for discrete data and the Kolmogorov-Smirnov and chi-square tests for continuous data. The human observer proved more accurate in identifying continuous distributions and the chi-square test proved to be superior in identifying discrete distributions. The effect of sample size and number of intervals in the histogram was included in the experimental design.

Download Full-text

An Information Matrix Test for the Collapsing of Categories Under the Partial Credit Model

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998618787478 ◽

2018 ◽

Vol 43 (6) ◽

pp. 721-750

Author(s):

Daphna Harel ◽

Russell J. Steele

Keyword(s):

Goodness Of Fit ◽

Model Misspecification ◽

Information Matrix ◽

Partial Credit Model ◽

Partial Credit ◽

Goodness Of Fit Test ◽

Data Set ◽

Data Reduction Technique ◽

And Performance ◽

Information Matrix Test

Collapsing categories is a commonly used data reduction technique; however, to date there do not exist principled methods to determine whether collapsing categories is appropriate in practice. With ordinal responses under the partial credit model, when collapsing categories, the true model for the collapsed data is no longer a partial credit model, and therefore refitting a partial credit model may result in model misspecification. This article details the implementation and performance of an information matrix test (IMT) to assess the implications of collapsing categories for a given data set under the partial credit model and compares its performance to the application of a nominal response model (NRM) and the S − X2 goodness-of-fit statistic. The IMT and NRM-based test are able to correctly determine the true number of categories for an item, given reasonable power through this goodness-of-fit test. We conclude by applying the test to a well-studied data set from the literature.

Download Full-text

Generalised block bootstrap and its use in meteorology

Advances in Statistical Climatology Meteorology and Oceanography ◽

10.5194/ascmo-3-55-2017 ◽

2017 ◽

Vol 3 (1) ◽

pp. 55-66 ◽

Cited By ~ 1

Author(s):

László Varga ◽

András Zempléni

Keyword(s):

Sample Size ◽

Statistical Tests ◽

Block Size ◽

Temperature Data ◽

Effective Sample Size ◽

Block Bootstrap ◽

Test Statistic ◽

Positive Real ◽

Data Set ◽

Bootstrap Methodology

Abstract. In an earlier paper, Rakonczai et al.(2014) emphasised the importance of investigating the effective sample size in case of autocorrelated data. The simulations were based on the block bootstrap methodology. However, the discreteness of the usual block size did not allow for exact calculations. In this paper we propose a new generalisation of the block bootstrap methodology, which allows for any positive real number as expected block size. We relate it to the existing optimisation procedures and apply it to a temperature data set. Our other focus is on statistical tests, where quite often the actual sample size plays an important role, even in the case of relatively large samples. This is especially the case for copulas. These are used for investigating the dependencies among data sets. As in quite a few real applications the time dependence cannot be neglected, we investigated the effect of this phenomenon on the used test statistic. The critical value can be computed by the proposed new block bootstrap simulation, where the block size is determined by fitting a VAR model to the observations. The results are illustrated for models of the used temperature data.

Download Full-text

Norms for Automatic Estimation of Hippocampal Atrophy and a Step Forward for Applicability to the Italian Population

Frontiers in Neuroscience ◽

10.3389/fnins.2021.656808 ◽

2021 ◽

Vol 15 ◽

Author(s):

Silvia De Francesco ◽

Samantha Galluzzi ◽

Nicola Vanacore ◽

Cristina Festari ◽

Paolo Maria Rossini ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Goodness Of Fit ◽

Automatic Segmentation ◽

Hippocampal Volume ◽

Hippocampal Atrophy ◽

Italian Population ◽

Healthy Controls ◽

Goodness Of Fit Test ◽

Data Set

IntroductionHippocampal volume is one of the main biomarkers of Alzheimer’s Dementia (AD). Over the years, advanced tools that performed automatic segmentation of Magnetic Resonance Imaging (MRI) T13D scans have been developed, such as FreeSurfer (FS) and ACM-Adaboost (AA). Hippocampal volume is considered abnormal when it is below the 5th percentile of the normative population. The aim of this study was to set norms, established from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) population, for hippocampal volume measured with FS v.6.0 and AA tools in the neuGRID platform (www.neugrid2.eu) and demonstrate their applicability for the Italian population.MethodsNorms were set from a large group of 545 healthy controls belonging to ADNI. For each pipeline, subjects with segmentation errors were discarded, resulting in 532 valid segmentations for FS and 421 for AA (age range 56–90 years). The comparability of ADNI and the Italian Brain Normative Archive (IBNA), representative of the Italian general population, was assessed testing clinical variables, neuropsychological scores and normalized hippocampal volumes. Finally, percentiles were validated using the Italian Alzheimer’s disease Repository Without Borders (ARWiBo) as external independent data set to evaluate FS and AA generalizability.ResultsHippocampal percentiles were checked with the chi-square goodness of fit test. P-values were not significant, showing that FS and AA algorithm distributions fitted the data well. Clinical, neuropsychological and volumetric features were similar in ADNI and IBNA (p > 0.01). Hippocampal volumes measured with both FS and AA were associated with age (p < 0.001). The 5th percentile thresholds, indicating left/right hippocampal atrophy were respectively: (i) below 3,223/3,456 mm3 at 56 years and 2,506/2,415 mm3 at 90 years for FS; (ii) below 4,583/4,873 mm3 at 56 years and 3,831/3,870 mm3 at 90 years for AA. The average volumes computed on 100 cognitively intact healthy controls (CN) selected from ARWiBo were close to the 50th percentiles, while those for 100 AD patients were close to the abnormal percentiles.DiscussionNorms generated from ADNI through the automatic FS and AA segmentation tools may be used as normative references for Italian patients with suspected AD.

Download Full-text