scholarly journals Monte Carlo simulation of OLS and linear mixed model inference of phenotypic effects on gene expression

PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2575
Author(s):  
Jeffrey A. Walker

BackgroundSelf-contained tests estimate and test the association between a phenotype and mean expression level in a gene set defineda priori. Many self-contained gene set analysis methods have been developed but the performance of these methods for phenotypes that are continuous rather than discrete and with multiple nuisance covariates has not been well studied. Here, I use Monte Carlo simulation to evaluate the performance of both novel and previously published (and readily available via R) methods for inferring effects of a continuous predictor on mean expression in the presence of nuisance covariates. The motivating data are a high-profile dataset which was used to show opposing effects of hedonic and eudaimonic well-being (or happiness) on the mean expression level of a set of genes that has been correlated with social adversity (the CTRA gene set). The original analysis of these data used a linear model (GLS) of fixed effects with correlated error to infer effects ofHedoniaandEudaimoniaon mean CTRA expression.MethodsThe standardized effects ofHedoniaandEudaimoniaon CTRA gene set expression estimated by GLS were compared to estimates using multivariate (OLS) linear models and generalized estimating equation (GEE) models. The OLS estimates were tested using O’Brien’s OLS test, Anderson’s permutation ${r}_{F}^{2}$-test, two permutationF-tests (including GlobalAncova), and a rotationz-test (Roast). The GEE estimates were tested using a Wald test with robust standard errors. The performance (Type I, II, S, and M errors) of all tests was investigated using a Monte Carlo simulation of data explicitly modeled on the re-analyzed dataset.ResultsGLS estimates are inconsistent between data sets, and, in each dataset, at least one coefficient is large and highly statistically significant. By contrast, effects estimated by OLS or GEE are very small, especially relative to the standard errors. Bootstrap and permutation GLS distributions suggest that the GLS results in downward biased standard errors and inflated coefficients. The Monte Carlo simulation of error rates shows highly inflated Type I error from the GLS test and slightly inflated Type I error from the GEE test. By contrast, Type I error for all OLS tests are at the nominal level. The permutationF-tests have ∼1.9X the power of the other OLS tests. This increased power comes at a cost of high sign error (∼10%) if tested on small effects.DiscussionThe apparently replicated pattern of well-being effects on gene expression is most parsimoniously explained as “correlated noise” due to the geometry of multiple regression. The GLS for fixed effects with correlated error, or any linear mixed model for estimating fixed effects in designs with many repeated measures or outcomes, should be used cautiously because of the inflated Type I and M error. By contrast, all OLS tests perform well, and the permutationF-tests have superior performance, including moderate power for very small effects.

1992 ◽  
Vol 17 (4) ◽  
pp. 315-339 ◽  
Author(s):  
Michael R. Harwell ◽  
Elaine N. Rubinstein ◽  
William S. Hayes ◽  
Corley C. Olds

Meta-analytic methods were used to integrate the findings of a sample of Monte Carlo studies of the robustness of the F test in the one- and two-factor fixed effects ANOVA models. Monte Carlo results for the Welch (1947) and Kruskal-Wallis (Kruskal & Wallis, 1952) tests were also analyzed. The meta-analytic results provided strong support for the robustness of the Type I error rate of the F test when certain assumptions were violated. The F test also showed excellent power properties. However, the Type I error rate of the F test was sensitive to unequal variances, even when sample sizes were equal. The error rate of the Welch test was insensitive to unequal variances when the population distribution was normal, but nonnormal distributions tended to inflate its error rate and to depress its power. Meta-analytic and exact statistical theory results were used to summarize the effects of assumption violations for the tests.


2020 ◽  
Author(s):  
Brandon LeBeau

<p>The linear mixed model is a commonly used model for longitudinal or nested data due to its ability to account for the dependency of nested data. Researchers typically rely on the random effects to adequately account for the dependency due to correlated data, however serial correlation can also be used. If the random effect structure is misspecified (perhaps due to convergence problems), can the addition of serial correlation overcome this misspecification and allow for unbiased estimation and accurate inferences? This study explored this question with a simulation. Simulation results show that the fixed effects are unbiased, however inflation of the empirical type I error rate occurs when a random effect is missing from the model. Implications for applied researchers are discussed.</p>


2016 ◽  
Author(s):  
Jeffrey A. Walker

AbstractBackgroundThis paper presents a re-analysis of the gene set data from Fredrickson et al. 2013 and Fredrickson et al. 2015 which purportedly showed opposing effects of hedonic and eudaimonic happiness on the expression levels of a set of genes that have been correlated with social adversity. Fredrickson et al. 2015 used a linear model of fixed effects with correlated error (using GLS) to estimate the partial regression coefficients.MethodsThe standardized effects of hedonic and eudaimonic happiness on CTRA gene set expression estimated by GLS was compared to estimates using multivariate (OLS) linear models and generalized estimating equation (GEE) models. The OLS estimates were tested using a bootstrap t-test, O’Brien’s OLS test, a permutation t test, and the rotation z-test. The GEE estimates were tested using a Wald test with robust standard errors. The performance (type I, type II, and type M error) of all tests was investigated using a Monte Carlo simulation of data modeled after the 2015 dataset.ResultsStandardized OLS effects (mean partial regression coefficients) of Hedonia and Eudaimonia on gene expression levels are very small in both the 2013 and 2015 data, as well as the combined data.The p-values from all tests fail to reject any of the null models. The GEE estimates and tests are nearly identical to the OLS estimates and tests. By contrast, the GLS estimates are inconsistent between data sets, but in each dataset, at least one coefficient is large and highly statistically significant. The Monte Carlo simulation of error rates shows inflated type I error from the GLS test on data with a similar correlation structure to that in the 2015 dataset, and this error rate increases as the number of outcomes increases relative to the number of subjects. Bootstrap and permutation GLS distributions suggest that the GLS model not only results in downward biased standard errors but also inflated coefficients. Both distributions also show the expected, strong, negative correlation between the coefficients for Hedonia and Eudaimonia.DiscussionThe results fail to support opposing effects, or any detectable effect, of hedonic and eudaimonic well being on the pattern of gene expression. The apparently replicated pattern of hedonic and eudaimonic effects on gene expression is most parsimoniously explained as "correlated noise" due to the geometry of multiple regression. A linear mixed model for estimating fixed effects in designs with many repeated measures or outcomes should be used cautiously because of the potentially inflated type 1 and type M error.


2005 ◽  
Vol 32 (3) ◽  
pp. 193-195 ◽  
Author(s):  
Holly Raffle ◽  
Gordon P. Brooks

Violations of assumptions, inflated Type I error rates, and robustness are important concepts for students to learn in an introductory statistics course. However, these abstract ideas can be difficult for students to understand. Monte Carlo simulation methods can provide a concrete way for students to learn abstract statistical concepts. This article describes the MC4G computer software (Brooks, 2004) and the accompanying instructor's manual (Raffle, 2004). It also provides a case study that includes both assessment and course evaluation data supporting the effectiveness of Monte Carlo simulation exercises in a graduate-level statistics course.


2017 ◽  
Author(s):  
Jesse E D Miller ◽  
Anthony Ives ◽  
Ellen Damschen

1. Plant functional traits are increasingly being used to infer mechanisms about community assembly and predict global change impacts. Of the several approaches that are used to analyze trait-environment relationships, one of the most popular is community-weighted means (CWM), in which species trait values are averaged at the site level. Other approaches that do not require averaging are being developed, including multilevel models (MLM, also called generalized linear mixed models). However, relative strengths and weaknesses of these methods have not been extensively compared. 2. We investigated three statistical models for trait-environment associations: CWM, a MLM in which traits were not included as fixed effects (MLM1), and a MLM with traits as fixed effects (MLM2). We analyzed a real plant community dataset to investigate associations between two traits and one environmental variable. We then analyzed permutations of the dataset to investigate sources of type I errors, and performed a simulation study to compare the statistical power of the methods. 3. In the analysis of real data, CWM gave highly significant associations for both traits, while MLM1 and MLM2 did not. Using P-values derived by simulating the data using the fitted MLM2, none of the models gave significant associations, showing that CWM had inflated type I errors (false positives). In the permutation tests, MLM2 performed the best of the three approaches. MLM2 still had inflated type I error rates in some situations, but this could be corrected using bootstrapping. The simulation study showed that MLM2 always had as good or better power than CWM. These simulations also confirmed the causes of type I errors from the permutation study. 4. The MLM that includes main effects of traits (MLM2) is the best method for identifying trait-environmental association in community assembly, with better type I error control and greater power. Analyses that regress CWMs on continuous environmental variables are not reliable because they are likely to produce type I errors.


Computation ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 126
Author(s):  
Timothy Opheim ◽  
Anuradha Roy

This review is about verifying and generalizing the supremum test statistic developed by Balakrishnan et al. Exhaustive simulation studies are conducted for various dimensions to determine the effect, in terms of empirical size, of the supremum test statistic developed by Balakrishnan et al. to test multivariate skew-normality. Monte Carlo simulation studies indicate that the Type-I error of the supremum test can be controlled reasonably well for various dimensions for given nominal significance levels 0.05 and 0.01. Cut-off values are provided for the number of samples required to attain the nominal significance levels 0.05 and 0.01. Some new and relevant information of the supremum test statistic are reported here.


2021 ◽  
pp. 096228022110082
Author(s):  
Yang Li ◽  
Wei Ma ◽  
Yichen Qin ◽  
Feifang Hu

Concerns have been expressed over the validity of statistical inference under covariate-adaptive randomization despite the extensive use in clinical trials. In the literature, the inferential properties under covariate-adaptive randomization have been mainly studied for continuous responses; in particular, it is well known that the usual two-sample t-test for treatment effect is typically conservative. This phenomenon of invalid tests has also been found for generalized linear models without adjusting for the covariates and are sometimes more worrisome due to inflated Type I error. The purpose of this study is to examine the unadjusted test for treatment effect under generalized linear models and covariate-adaptive randomization. For a large class of covariate-adaptive randomization methods, we obtain the asymptotic distribution of the test statistic under the null hypothesis and derive the conditions under which the test is conservative, valid, or anti-conservative. Several commonly used generalized linear models, such as logistic regression and Poisson regression, are discussed in detail. An adjustment method is also proposed to achieve a valid size based on the asymptotic results. Numerical studies confirm the theoretical findings and demonstrate the effectiveness of the proposed adjustment method.


2020 ◽  
Author(s):  
Jeff Miller

Contrary to the warning of Miller (1988), Rousselet and Wilcox (2020) argued that it is better to summarize each participant’s single-trial reaction times (RTs) in a given condition with the median than with the mean when comparing the central tendencies of RT distributions across experimental conditions. They acknowledged that median RTs can produce inflated Type I error rates when conditions differ in the number of trials tested, consistent with Miller’s warning, but they showed that the bias responsible for this error rate inflation could be eliminated with a bootstrap bias correction technique. The present simulations extend their analysis by examining the power of bias-corrected medians to detect true experimental effects and by comparing this power with the power of analyses using means and regular medians. Unfortunately, although bias-corrected medians solve the problem of inflated Type I error rates, their power is lower than that of means or regular medians in many realistic situations. In addition, even when conditions do not differ in the number of trials tested, the power of tests (e.g., t-tests) is generally lower using medians rather than means as the summary measures. Thus, the present simulations demonstrate that summary means will often provide the most powerful test for differences between conditions, and they show what aspects of the RT distributions determine the size of the power advantage for means.


Sign in / Sign up

Export Citation Format

Share Document