Handling Skewed Data: A Comparison of Two Popular Methods

Hanan M. Hammouri; Roy T. Sabo; Rasha Alsaadawi; Khalid A. Kheirallah

doi:10.3390/app10186247

Handling Skewed Data: A Comparison of Two Popular Methods

Applied Sciences ◽

10.3390/app10186247 ◽

2020 ◽

Vol 10 (18) ◽

pp. 6247

Author(s):

Hanan M. Hammouri ◽

Roy T. Sabo ◽

Rasha Alsaadawi ◽

Khalid A. Kheirallah

Keyword(s):

Type I Error ◽

Error Rates ◽

T Test ◽

Superior Performance ◽

Type I ◽

Distributed Data ◽

Skewed Data ◽

Log Transformation ◽

Gamma Distributions ◽

Mean Differences

Scientists in biomedical and psychosocial research need to deal with skewed data all the time. In the case of comparing means from two groups, the log transformation is commonly used as a traditional technique to normalize skewed data before utilizing the two-group t-test. An alternative method that does not assume normality is the generalized linear model (GLM) combined with an appropriate link function. In this work, the two techniques are compared using Monte Carlo simulations; each consists of many iterations that simulate two groups of skewed data for three different sampling distributions: gamma, exponential, and beta. Afterward, both methods are compared regarding Type I error rates, power rates and the estimates of the mean differences. We conclude that the t-test with log transformation had superior performance over the GLM method for any data that are not normal and follow beta or gamma distributions. Alternatively, for exponentially distributed data, the GLM method had superior performance over the t-test with log transformation.

Download Full-text

How Does Polytomous Item Bias Affect Total-group Survey Score Comparisons?

Sociological Methods & Research ◽

10.1177/0049124115605333 ◽

2015 ◽

Vol 46 (3) ◽

pp. 586-603 ◽

Cited By ~ 1

Author(s):

Ma Dolores Hidalgo ◽

Isabel Benítez ◽

Jose-Luis Padilla ◽

Juana Gómez-Benito

Keyword(s):

Effect Size ◽

Type I Error ◽

Error Rates ◽

T Test ◽

Type I ◽

Polytomous Items ◽

Item Functioning ◽

Scale Scores ◽

Comparative Group ◽

The Impact

The growing use of scales in survey questionnaires warrants the need to address how does polytomous differential item functioning (DIF) affect observed scale score comparisons. The aim of this study is to investigate the impact of DIF on the type I error and effect size of the independent samples t-test on the observed total scale scores. A simulation study was conducted, focusing on potential variables related to DIF in polytomous items, such as DIF pattern, sample size, magnitude, and percentage of DIF items. The results showed that DIF patterns and the number of DIF items affected the type I error rates and effect size of t-test values. The results highlighted the need to analyze DIF before making comparative group interpretations.

Download Full-text

Type I Error Rates for Welch’s Test and James’s Second-Order Test Under Nonnormality and Inequality of Variance When There Are Two Groups

Journal of Educational Statistics ◽

10.3102/10769986019003275 ◽

1994 ◽

Vol 19 (3) ◽

pp. 275-291 ◽

Cited By ~ 28

Author(s):

James Algina ◽

T. C. Oshima ◽

Wen-Ying Lin

Keyword(s):

Degrees Of Freedom ◽

Type I Error ◽

Total Sample ◽

Error Rates ◽

Second Order ◽

T Test ◽

Type I ◽

Sample Sizes ◽

Unequal Variances ◽

Type I Error Rates

Type I error rates were estimated for three tests that compare means by using data from two independent samples: the independent samples t test, Welch’s approximate degrees of freedom test, and James’s second-order test. Type I error rates were estimated for skewed distributions, equal and unequal variances, equal and unequal sample sizes, and a range of total sample sizes. Welch’s test and James’s test have very similar Type I error rates and tend to control the Type I error rate as well or better than the independent samples t test does. The results provide guidance about the total sample sizes required for controlling Type I error rates.

Download Full-text

Assessment of Type I Error Rates and Power of Common Analysis Methods in Murine Obesity-Related Study: ‘Plasmode-Based’ Simulation (P13-011-19)

Current Developments in Nutrition ◽

10.1093/cdn/nzz036.p13-011-19 ◽

2019 ◽

Vol 3 (Supplement_1) ◽

Author(s):

Keisuke Ejima ◽

Andrew Brown ◽

Daniel Smith ◽

Ufuk Beyaztas ◽

David Allison

Keyword(s):

Sample Size ◽

Error Rate ◽

Type I Error ◽

Error Rates ◽

T Test ◽

Small Samples ◽

Type I ◽

Type I Error Rates ◽

Type I Error Rate ◽

Weight Distributions

Abstract Objectives Rigor, reproducibility and transparency (RRT) awareness has expanded over the last decade. Although RRT can be improved from various aspects, we focused on type I error rates and power of commonly used statistical analyses testing mean differences of two groups, using small (n ≤ 5) to moderate sample sizes. Methods We compared data from five distinct, homozygous, monogenic, murine models of obesity with non-mutant controls of both sexes. Baseline weight (7–11 weeks old) was the outcome. To examine whether type I error rate could be affected by choice of statistical tests, we adjusted the empirical distributions of weights to ensure the null hypothesis (i.e., no mean difference) in two ways: Case 1) center both weight distributions on the same mean weight; Case 2) combine data from control and mutant groups into one distribution. From these cases, 3 to 20 mice were resampled to create a ‘plasmode’ dataset. We performed five common tests (Student's t-test, Welch's t-test, Wilcoxon test, permutation test and bootstrap test) on the plasmodes and computed type I error rates. Power was assessed using plasmodes, where the distribution of the control group was shifted by adding a constant value as in Case 1, but to realize nominal effect sizes. Results Type I error rates were unreasonably higher than the nominal significance level (type I error rate inflation) for Student's t-test, Welch's t-test and permutation especially when sample size was small for Case 1, whereas inflation was observed only for permutation for Case 2. Deflation was noted for bootstrap with small sample. Increasing sample size mitigated inflation and deflation, except for Wilcoxon in Case 1 because heterogeneity of weight distributions between groups violated assumptions for the purposes of testing mean differences. For power, a departure from the reference value was observed with small samples. Compared with the other tests, bootstrap was underpowered with small samples as a tradeoff for maintaining type I error rates. Conclusions With small samples (n ≤ 5), bootstrap avoided type I error rate inflation, but often at the cost of lower power. To avoid type I error rate inflation for other tests, sample size should be increased. Wilcoxon should be avoided because of heterogeneity of weight distributions between mutant and control mice. Funding Sources This study was supported in part by NIH and Japan Society for Promotion of Science (JSPS) KAKENHI grant.

Download Full-text

Robustness of T-test Based on Skewness and Kurtosis

Journal of Advances in Mathematics and Computer Science ◽

10.9734/jamcs/2021/v36i230342 ◽

2021 ◽

pp. 102-110

Author(s):

Steven T. Garren ◽

Kate McGann Osborne

Keyword(s):

Degrees Of Freedom ◽

Type I Error ◽

Error Rates ◽

T Test ◽

Type I ◽

Skewed Distributions ◽

Symmetric Distributions ◽

Type I Error Rates ◽

Coverage Probabilities ◽

Skewness And Kurtosis

Coverage probabilities of the two-sided one-sample t-test are simulated for some symmetric and right-skewed distributions. The symmetric distributions analyzed are Normal, Uniform, Laplace, and student-t with 5, 7, and 10 degrees of freedom. The right-skewed distributions analyzed are Exponential and Chi-square with 1, 2, and 3 degrees of freedom. Left-skewed distributions were not analyzed without loss of generality. The coverage probabilities for the symmetric distributions tend to achieve or just barely exceed the nominal values. The coverage probabilities for the skewed distributions tend to be too low, indicating high Type I error rates. Percentiles for the skewness and kurtosis statistics are simulated using Normal data. For sample sizes of 5, 10, 15 and 20 the skewness statistic does an excellent job of detecting non-Normal data, except for Uniform data. The kurtosis statistic also does an excellent job of detecting non-Normal data, including Uniform data. Examined herein are Type I error rates, but not power calculations. We nd that sample skewness is unhelpful when determining whether or not the t-test should be used, but low sample kurtosis is reason to avoid using the t-test.

Download Full-text

Exact Type 1 Error Rates for Robustness of Student's t Test with Unequal Variances

Journal of Educational Statistics ◽

10.3102/10769986005004337 ◽

1980 ◽

Vol 5 (4) ◽

pp. 337-349 ◽

Cited By ~ 29

Author(s):

Philip H. Ramsey

Keyword(s):

Type I Error ◽

Error Rates ◽

T Test ◽

Type I ◽

Sample Sizes ◽

Unequal Variances ◽

Normal Populations ◽

Type 1 Error ◽

Student’S T ◽

Equal Population

It is noted that disagreements have arisen in the literature about the robustness of the t test in normal populations with unequal variances. Hsu's procedure is applied to determine exact Type I error rates for t. Employing fairly liberal but objective standards for assessing robustness, it is shown that the t test is not always robust to the assumption of equal population variances even when sample sizes are equal. Several guidelines are suggested including the point that to apply t at α = .05 without regard for unequal variances would require equal sample sizes of at least 15 by one of the standards considered. In many cases, especially those with unequal N's, an alternative such as Welch's procedure is recommended.

Download Full-text

Két pszichológiai populáció sztochasztikus egyenlőségének ellenőrzésére alkalmas statisztikai próbák összehasonlító vizsgálata

Magyar Pszichológiai Szemle ◽

10.1556/mpszle.55.2000.2-3.5 ◽

2000 ◽

Vol 55 (2-3) ◽

pp. 253-281 ◽

Cited By ~ 1

Author(s):

András Vargha

Keyword(s):

Type I Error ◽

Statistical Tests ◽

Power Level ◽

Error Rates ◽

T Test ◽

Type I ◽

Welch Test ◽

Wide Range ◽

Two Populations ◽

Stochastic Equality

A jelen tanulmányban a sztochasztikus egyenlőség ellenőrzésére alkalmas hat statisztikai próbát hasonlítottunk össze számítógépes szimulációval az érvényesség és a hatékonyság kritériuma szempontjából. Két populációt akkor mondunk sztochasztikusan egyenlőnek valamely X változó tekintetében, ha véletlenszerűen kiválasztva egy-egy X-értéket a két populációból, az elsőből kiválasztott érték ugyanakkora eséllyel lesz nagyobb a második kiválasztottnál, mint kisebb.A szimulációban széles tartományban variáltuk az eloszlások ferdeségét és csúcsosságát, valamint a szórásheterogenitás mértékét. Vizsgáltunk kicsi és közepes nagyságú, illetve egyenlő és különböző elemszámú mintákat. A szimulációba a korábban már mások által is javasolt próbák (rang t, rang Welch, Fligner-Policello, Cliff) mellett elméleti megfontolások alapján két új próbát (FPW és FPCW) is bevontunk.A szimulációs vizsgálatok arra az érdekes eredményre vezettek, hogy az újonnan javasolt két próba, FPW és FPCW az érvényesség tekintetében sokkal megbízhatóbb eljárásnak bizonyult, mint a többiek, miközben az erő tekintetében nem tapasztaltunk számottevő különbséget közöttük. Különösen FPW jeleskedett azzal, hogy I. fajta hibája sosem tért el számottevően a névleges szinttől. Közepes nagyságú minták esetén FPCW FPW-vel egyenértékű eljárás benyomását keltette.In the current paper six statistical tests of stochastic equality are to be compared by a Monte Carlo simulation with respect to Type I error and power. Two populations are said to be stochastically equal with respect to a variable X, if for any two independently and randomly drawn observations X1 and X2 from the two populations P(X1 ≯ X2) = P(X1 < X2).In the simulation the skewness and kurtosis levels as well as the extent of variance heterogeneity of the two parent distributions were varied across a wide range. The sample sizes applied were either small or moderate, and equal or unequal. The involved tests of stochastic equality were as follows: rank t test, rank Welch test, Fligner-Policello test, Cliff's modified Fligner-Policello test as well as two modifications of the last two tests, denoted FPW and FPCW, that utilized adjusted degrees of freedom.An interesting result obtained is that the two newly introduced test variants, FPW and FPCW, proved to be substantially more accurate with regard to their Type I error rates than the others, whereas they kept a similar power level. Specifically, the estimated Type I error of FPW at .05 nominal level always fell in the range of .043-.063 even if the variance ratio of the two distributions was as large as 1:16. The same ranges were .049-.068 for FPCW, but .029-.160 for the rank t test, .049-.096 for the rank Welch test, .035-.075 for the Fligner-Policello test, and .040-.078 for Cliff's test.

Download Full-text

Type I error rates and power of several versions of scaled chi-square difference tests in investigations of measurement invariance.

Psychological Methods ◽

10.1037/met0000097 ◽

2017 ◽

Vol 22 (3) ◽

pp. 467-485 ◽

Cited By ~ 4

Author(s):

Jordan Campbell Brace ◽

Victoria Savalei

Keyword(s):

Measurement Invariance ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Chi Square ◽

Type I Error Rates

Download Full-text

Correction: “Influence of Selection Bias on the Test Decision – A Simulation Study”

Methods of Information in Medicine ◽

10.3414/me11-01-0043e ◽

2014 ◽

Vol 53 (05) ◽

pp. 343-343

Keyword(s):

Selection Bias ◽

Simulation Study ◽

Error Rate ◽

Type I Error ◽

Block Size ◽

Error Rates ◽

Type I ◽

Type I Error Rate ◽

Representation Error ◽

Numeric Representation

We have to report marginal changes in the empirical type I error rates for the cut-offs 2/3 and 4/7 of Table 4, Table 5 and Table 6 of the paper “Influence of Selection Bias on the Test Decision – A Simulation Study” by M. Tamm, E. Cramer, L. N. Kennes, N. Heussen (Methods Inf Med 2012; 51: 138 –143). In a small number of cases the kind of representation of numeric values in SAS has resulted in wrong categorization due to a numeric representation error of differences. We corrected the simulation by using the round function of SAS in the calculation process with the same seeds as before. For Table 4 the value for the cut-off 2/3 changes from 0.180323 to 0.153494. For Table 5 the value for the cut-off 4/7 changes from 0.144729 to 0.139626 and the value for the cut-off 2/3 changes from 0.114885 to 0.101773. For Table 6 the value for the cut-off 4/7 changes from 0.125528 to 0.122144 and the value for the cut-off 2/3 changes from 0.099488 to 0.090828. The sentence on p. 141 “E.g. for block size 4 and q = 2/3 the type I error rate is 18% (Table 4).” has to be replaced by “E.g. for block size 4 and q = 2/3 the type I error rate is 15.3% (Table 4).”. There were only minor changes smaller than 0.03. These changes do not affect the interpretation of the results or our recommendations.

Download Full-text

The Use of Theory of Linear Mixed-Effects Models to Detect Fraudulent Erasures at an Aggregate Level

Educational and Psychological Measurement ◽

10.1177/0013164421994893 ◽

2021 ◽

pp. 001316442199489

Author(s):

Luyao Peng ◽

Sandip Sinharay

Keyword(s):

Type I Error ◽

Real Data ◽

Mixed Effects ◽

Error Rates ◽

Mixed Effects Models ◽

Type I ◽

Aggregate Level ◽

Linear Mixed Effects Models ◽

Linear Mixed Effects ◽

Best Linear Unbiased

Wollack et al. (2015) suggested the erasure detection index (EDI) for detecting fraudulent erasures for individual examinees. Wollack and Eckerly (2017) and Sinharay (2018) extended the index of Wollack et al. (2015) to suggest three EDIs for detecting fraudulent erasures at the aggregate or group level. This article follows up on the research of Wollack and Eckerly (2017) and Sinharay (2018) and suggests a new aggregate-level EDI by incorporating the empirical best linear unbiased predictor from the literature of linear mixed-effects models (e.g., McCulloch et al., 2008). A simulation study shows that the new EDI has larger power than the indices of Wollack and Eckerly (2017) and Sinharay (2018). In addition, the new index has satisfactory Type I error rates. A real data example is also included.

Download Full-text