scholarly journals Violating the normality assumption may be the lesser of two evils

2018 ◽  
Author(s):  
Ulrich Knief ◽  
Wolfgang Forstmeier

AbstractWhen data are not normally distributed (e.g. skewed, zero-inflated, binomial, or count data) researchers are often uncertain whether it may be legitimate to use tests that assume Gaussian errors (e.g. regression, t-test, ANOVA, Gaussian mixed models), or whether one has to either model a more specific error structure or use randomization techniques.Here we use Monte Carlo simulations to explore the pros and cons of fitting Gaussian models to non-normal data in terms of risk of type I error, power and utility for parameter estimation.We find that Gaussian models are remarkably robust to non-normality over a wide range of conditions, meaning that P-values remain fairly reliable except for data with influential outliers judged at strict alpha levels. Gaussian models also perform well in terms of power and they can be useful for parameter estimation but usually not for extrapolation. Transformation of data before analysis is often advisable and visual inspection for outliers and heteroscedasticity is important for assessment. In strong contrast, some non-Gaussian models and randomization techniques bear a range of risks that are often insufficiently known. High rates of false-positive conclusions can arise for instance when overdispersion in count data is not controlled appropriately or when randomization procedures ignore existing non-independencies in the data.Overall, we argue that violating the normality assumption bears risks that are limited and manageable, while several more sophisticated approaches are relatively error prone and difficult to check during peer review. Hence, as long as scientists and reviewers are not fully aware of the risks, science might benefit from preferentially trusting Gaussian mixed models in which random effects account for non-independencies in the data in a transparent way.Tweetable abstractGaussian models are remarkably robust to even dramatic violations of the normality assumption.

Author(s):  
Ulrich Knief ◽  
Wolfgang Forstmeier

AbstractWhen data are not normally distributed, researchers are often uncertain whether it is legitimate to use tests that assume Gaussian errors, or whether one has to either model a more specific error structure or use randomization techniques. Here we use Monte Carlo simulations to explore the pros and cons of fitting Gaussian models to non-normal data in terms of risk of type I error, power and utility for parameter estimation. We find that Gaussian models are robust to non-normality over a wide range of conditions, meaning that p values remain fairly reliable except for data with influential outliers judged at strict alpha levels. Gaussian models also performed well in terms of power across all simulated scenarios. Parameter estimates were mostly unbiased and precise except if sample sizes were small or the distribution of the predictor was highly skewed. Transformation of data before analysis is often advisable and visual inspection for outliers and heteroscedasticity is important for assessment. In strong contrast, some non-Gaussian models and randomization techniques bear a range of risks that are often insufficiently known. High rates of false-positive conclusions can arise for instance when overdispersion in count data is not controlled appropriately or when randomization procedures ignore existing non-independencies in the data. Hence, newly developed statistical methods not only bring new opportunities, but they can also pose new threats to reliability. We argue that violating the normality assumption bears risks that are limited and manageable, while several more sophisticated approaches are relatively error prone and particularly difficult to check during peer review. Scientists and reviewers who are not fully aware of the risks might benefit from preferentially trusting Gaussian mixed models in which random effects account for non-independencies in the data.


2017 ◽  
Vol 94 ◽  
pp. 305-315 ◽  
Author(s):  
Hannes Matuschek ◽  
Reinhold Kliegl ◽  
Shravan Vasishth ◽  
Harald Baayen ◽  
Douglas Bates

Methodology ◽  
2010 ◽  
Vol 6 (4) ◽  
pp. 147-151 ◽  
Author(s):  
Emanuel Schmider ◽  
Matthias Ziegler ◽  
Erik Danay ◽  
Luzi Beyer ◽  
Markus Bühner

Empirical evidence to the robustness of the analysis of variance (ANOVA) concerning violation of the normality assumption is presented by means of Monte Carlo methods. High-quality samples underlying normally, rectangularly, and exponentially distributed basic populations are created by drawing samples which consist of random numbers from respective generators, checking their goodness of fit, and allowing only the best 10% to take part in the investigation. A one-way fixed-effect design with three groups of 25 values each is chosen. Effect-sizes are implemented in the samples and varied over a broad range. Comparing the outcomes of the ANOVA calculations for the different types of distributions, gives reason to regard the ANOVA as robust. Both, the empirical type I error α and the empirical type II error β remain constant under violation. Moreover, regression analysis identifies the factor “type of distribution” as not significant in explanation of the ANOVA results.


2013 ◽  
Vol 52 (04) ◽  
pp. 351-359 ◽  
Author(s):  
M. O. Scheinhardt ◽  
A. Ziegler

Summary Background: Gene, protein, or metabolite expression levels are often non-normally distributed, heavy tailed and contain outliers. Standard statistical approaches may fail as location tests in this situation. Objectives: In three Monte-Carlo simulation studies, we aimed at comparing the type I error levels and empirical power of standard location tests and three adaptive tests [O’Gorman, Can J Stat 1997; 25: 269 –279; Keselman et al., Brit J Math Stat Psychol 2007; 60: 267– 293; Szymczak et al., Stat Med 2013; 32: 524 – 537] for a wide range of distributions. Methods: We simulated two-sample scena -rios using the g-and-k-distribution family to systematically vary tail length and skewness with identical and varying variability between groups. Results: All tests kept the type I error level when groups did not vary in their variability. The standard non-parametric U-test per -formed well in all simulated scenarios. It was outperformed by the two non-parametric adaptive methods in case of heavy tails or large skewness. Most tests did not keep the type I error level for skewed data in the case of heterogeneous variances. Conclusions: The standard U-test was a powerful and robust location test for most of the simulated scenarios except for very heavy tailed or heavy skewed data, and it is thus to be recommended except for these cases. The non-parametric adaptive tests were powerful for both normal and non-normal distributions under sample variance homogeneity. But when sample variances differed, they did not keep the type I error level. The parametric adaptive test lacks power for skewed and heavy tailed distributions.


2009 ◽  
Vol 18 (1) ◽  
pp. 17-18 ◽  
Author(s):  
Eleonora Esposito ◽  
Andrea Cipriani ◽  
Corrado Barbui

Randomised controlled trials (RCTs) are designed and powered to measure one single outcome, calledprimary outcome(Sibbald & Roland, 1998; Barbuiet al., 2007). The primary outcome is the pre-specified outcome of greatest clinical importance and is usually the one used in the sample size calculation (Accordini, 2007). In addition to the primary outcome, RCTs may have several other outcomes, calledsecondary outcomes. In contrast with the analysis of the primary outcome, the analysis of secondary outcomes and its interpretation may be complicated by at least two factors:1)the trial may not have enough statistical power to detect differences (so it is possible to incur in a type II error, that is failing to see a difference that is present);2)increasing the number of secondary outcomes generates the problem of multiplicity of analyses, that is the proliferation of possible comparisons in a trial (and increasing the number of comparisons increases the possibility to incur in a type I error, that is detecting significant differences by chance). For all these reasons, the results of the analysis of primary outcomes is considered less susceptible to bias than the analysis of secondary outcomes.


Author(s):  
Patrick J. Rosopa ◽  
Alice M. Brawley ◽  
Theresa P. Atkinson ◽  
Stephen A. Robertson

Preliminary tests for homoscedasticity may be unnecessary in general linear models. Based on Monte Carlo simulations, results suggest that when testing for differences between independent slopes, the unconditional use of weighted least squares regression and HC4 regression performed the best across a wide range of conditions.


1993 ◽  
Vol 30 (2) ◽  
pp. 246-255 ◽  
Author(s):  
Murali Chandrashekaran ◽  
Beth A. Walker

To enhance the utility of meta-analysis as an integrative tool for marketing research, heteroscedastic MLE (HMLE), a maximum-likelihood-based estimation procedure, is proposed as a method that overcomes heteroscedasticity, a problem known to impair OLS estimates and threaten the validity of meta-analytic findings. The results of a Monté Carlo simulation experiment reveal that, under a wide range of heteroscedastic conditions, HMLE is more efficient and powerful than OLS and achieves these performance advantages without inflating type I error. Further, the relative performance of HMLE increases as heteroscedasticity becomes more severe. An empirical analysis of a meta-analytic dataset in marketing confirmed and extended these findings by illustrating how the enhanced efficiency and power of HMLE improve the ability to detect moderator variables and by demonstrating how the theoretical generalizations emerging from a meta-analysis are affected by the choice of the analytic procedure.


2011 ◽  
Vol 2011 ◽  
pp. 1-12 ◽  
Author(s):  
Emily A. Blood ◽  
Debbie M. Cheng

Linear mixed models (LMMs) are frequently used to analyze longitudinal data. Although these models can be used to evaluate mediation, they do not directly model causal pathways. Structural equation models (SEMs) are an alternative technique that allows explicit modeling of mediation. The goal of this paper is to evaluate the performance of LMMs relative to SEMs in the analysis of mediated longitudinal data with time-dependent predictors and mediators. We simulated mediated longitudinal data from an SEM and specified delayed effects of the predictor. A variety of model specifications were assessed, and the LMMs and SEMs were evaluated with respect to bias, coverage probability, power, and Type I error. Models evaluated in the simulation were also applied to data from an observational cohort of HIV-infected individuals. We found that when carefully constructed, the LMM adequately models mediated exposure effects that change over time in the presence of mediation, even when the data arise from an SEM.


2021 ◽  
Author(s):  
Sebastian Sosa ◽  
Cristian Pasquaretta ◽  
Ivan Puga-Gonzalez ◽  
F Stephen Dobson ◽  
Vincent A Viblanc ◽  
...  

Animal social network analyses (ASNA) have led to a foundational shift in our understanding of animal sociality that transcends the disciplinary boundaries of genetics, spatial movements, epidemiology, information transmission, evolution, species assemblages and conservation. However, some analytical protocols (i.e., permutation tests) used in ASNA have recently been called into question due to the unacceptable rates of false negatives (type I error) and false positives (type II error) they generate in statistical hypothesis testing. Here, we show that these rates are related to the way in which observation heterogeneity is accounted for in association indices. To solve this issue, we propose a method termed the "global index" (GI) that consists of computing the average of individual associations indices per unit of time. In addition, we developed an "index of interactions" (II) that allows the use of the GI approach for directed behaviours. Our simulations show that GI: 1) returns more reasonable rates of false negatives and positives, with or without observational biases in the collected data, 2) can be applied to both directed and undirected behaviours, 3) can be applied to focal sampling, scan sampling or "gambit of the group" data collection protocols, and 4) can be applied to first- and second-order social network measures. Finally, we provide a method to control for non-social biological confounding factors using linear regression residuals. By providing a reliable approach for a wide range of scenarios, we propose a novel methodology in ASNA with the aim of better understanding social interactions from a mechanistic, ecological and evolutionary perspective.


2019 ◽  
Author(s):  
Michael Seedorff ◽  
Jacob Oleson ◽  
Bob McMurray

Mixed effects models have become a critical tool in all areas of psychology and allied fields. This is due to their ability to account for multiple random factors, and their ability to handle proportional data in repeated measures designs. While substantial research has addressed how to structure fixed effects in such models there is less understanding of appropriate random effects structures. Recent work with linear models suggests the choice of random effects structures affects Type I error in such models (Barr, Levy, Scheepers, & Tily, 2013; Matuschek, Kliegl, Vasishth, Baayen, & Bates, 2017). This has not been examined for between subject effects, which are crucial for many areas of psychology, nor has this been examined in logistic models. Moreover, mixed models expose a number of researcher degrees of freedom: the decision to aggregate data or not, the manner in which degrees of freedom are computed, and what to do when models do not converge. However, the implications of these choices for power and Type I error are not well known. To address these issues, we conducted a series of Monte Carlo simulations which examined linear and logistic models in a mixed design with crossed random effects. These suggest that a consideration of the entire space of possible models using simple information criteria such as AIC leads to optimal power while holding Type I error constant. They also suggest data aggregation and the d.f, computation have minimal effects on Type I Error and Power, and they suggest appropriate approaches for dealing with non-convergence.


Sign in / Sign up

Export Citation Format

Share Document