scholarly journals Fixing the stimulus-as-fixed-effect fallacy in task fMRI

2016 ◽  
Author(s):  
Jacob Westfall ◽  
Thomas E. Nichols ◽  
Tal Yarkoni

AbstractMost fMRI experiments record the brain’s responses to samples of stimulus materials (e.g., faces or words). Yet the statistical modeling approaches used in fMRI research universally fail to model stimulus variability in a manner that affords population generalization--meaning that researchers’ conclusions technically apply only to the precise stimuli used in each study, and cannot be generalized to new stimuli. A direct consequence of this stimulus-as-fixed-effect fallacy is that the majority of published fMRI studies have likely overstated the strength of the statistical evidence they report. Here we develop a Bayesian mixed model (the random stimulus model; RSM) that addresses this problem, and apply it to a range of fMRI datasets. Results demonstrate considerable inflation (50 - 200 % in most of the studied datasets) of test statistics obtained from standard “summary statistics”-based approaches relative to the corresponding RSM models. We demonstrate how RSMs can be used to improve parameter estimates, properly control false positive rates, and test novel research hypotheses about stimulus-level variability in human brain responses.

2017 ◽  
Vol 1 ◽  
pp. 23 ◽  
Author(s):  
Jacob Westfall ◽  
Thomas E. Nichols ◽  
Tal Yarkoni

Most functional magnetic resonance imaging (fMRI) experiments record the brain’s responses to samples of stimulus materials (e.g., faces or words). Yet the statistical modeling approaches used in fMRI research universally fail to model stimulus variability in a manner that affords population generalization, meaning that researchers’ conclusions technically apply only to the precise stimuli used in each study, and cannot be generalized to new stimuli. A direct consequence of this stimulus-as-fixed-effect fallacy is that the majority of published fMRI studies have likely overstated the strength of the statistical evidence they report. Here we develop a Bayesian mixed model (the random stimulus model; RSM) that addresses this problem, and apply it to a range of fMRI datasets. Results demonstrate considerable inflation (50-200% in most of the studied datasets) of test statistics obtained from standard “summary statistics”-based approaches relative to the corresponding RSM models. We demonstrate how RSMs can be used to improve parameter estimates, properly control false positive rates, and test novel research hypotheses about stimulus-level variability in human brain responses.


2016 ◽  
Vol 1 ◽  
pp. 23 ◽  
Author(s):  
Jacob Westfall ◽  
Thomas E. Nichols ◽  
Tal Yarkoni

Most functional magnetic resonance imaging (fMRI) experiments record the brain’s responses to samples of stimulus materials (e.g., faces or words). Yet the statistical modeling approaches used in fMRI research universally fail to model stimulus variability in a manner that affords population generalization, meaning that researchers’ conclusions technically apply only to the precise stimuli used in each study, and cannot be generalized to new stimuli. A direct consequence of this stimulus-as-fixed-effect fallacy is that the majority of published fMRI studies have likely overstated the strength of the statistical evidence they report. Here we develop a Bayesian mixed model (the random stimulus model; RSM) that addresses this problem, and apply it to a range of fMRI datasets. Results demonstrate considerable inflation (50-200% in most of the studied datasets) of test statistics obtained from standard “summary statistics”-based approaches relative to the corresponding RSM models. We demonstrate how RSMs can be used to improve parameter estimates, properly control false positive rates, and test novel research hypotheses about stimulus-level variability in human brain responses.


Genetics ◽  
1996 ◽  
Vol 143 (4) ◽  
pp. 1819-1829 ◽  
Author(s):  
G Thaller ◽  
L Dempfle ◽  
I Hoeschele

Abstract Maximum likelihood methodology was applied to determine the mode of inheritance of rare binary traits with data structures typical for swine populations. The genetic models considered included a monogenic, a digenic, a polygenic, and three mixed polygenic and major gene models. The main emphasis was on the detection of major genes acting on a polygenic background. Deterministic algorithms were employed to integrate and maximize likelihoods. A simulation study was conducted to evaluate model selection and parameter estimation. Three designs were simulated that differed in the number of sires/number of dams within sires (10/10, 30/30, 100/30). Major gene effects of at least one SD of the liability were detected with satisfactory power under the mixed model of inheritance, except for the smallest design. Parameter estimates were empirically unbiased with acceptable standard errors, except for the smallest design, and allowed to distinguish clearly between the genetic models. Distributions of the likelihood ratio statistic were evaluated empirically, because asymptotic theory did not hold. For each simulation model, the Average Information Criterion was computed for all models of analysis. The model with the smallest value was chosen as the best model and was equal to the true model in almost every case studied.


Author(s):  
Anna L Tyler ◽  
Baha El Kassaby ◽  
Georgi Kolishovski ◽  
Jake Emerson ◽  
Ann E Wells ◽  
...  

Abstract It is well understood that variation in relatedness among individuals, or kinship, can lead to false genetic associations. Multiple methods have been developed to adjust for kinship while maintaining power to detect true associations. However, relatively unstudied, are the effects of kinship on genetic interaction test statistics. Here we performed a survey of kinship effects on studies of six commonly used mouse populations. We measured inflation of main effect test statistics, genetic interaction test statistics, and interaction test statistics reparametrized by the Combined Analysis of Pleiotropy and Epistasis (CAPE). We also performed linear mixed model (LMM) kinship corrections using two types of kinship matrix: an overall kinship matrix calculated from the full set of genotyped markers, and a reduced kinship matrix, which left out markers on the chromosome(s) being tested. We found that test statistic inflation varied across populations and was driven largely by linkage disequilibrium. In contrast, there was no observable inflation in the genetic interaction test statistics. CAPE statistics were inflated at a level in between that of the main effects and the interaction effects. The overall kinship matrix overcorrected the inflation of main effect statistics relative to the reduced kinship matrix. The two types of kinship matrices had similar effects on the interaction statistics and CAPE statistics, although the overall kinship matrix trended toward a more severe correction. In conclusion, we recommend using a LMM kinship correction for both main effects and genetic interactions and further recommend that the kinship matrix be calculated from a reduced set of markers in which the chromosomes being tested are omitted from the calculation. This is particularly important in populations with substantial population structure, such as recombinant inbred lines in which genomic replicates are used.


2017 ◽  
Author(s):  
Ronald de Vlaming ◽  
Magnus Johannesson ◽  
Patrik K.E. Magnusson ◽  
M. Arfan Ikram ◽  
Peter M. Visscher

AbstractLD-score (LDSC) regression disentangles the contribution of polygenic signal, in terms of SNP-based heritability, and population stratification, in terms of a so-called intercept, to GWAS test statistics. Whereas LDSC regression uses summary statistics, methods like Haseman-Elston (HE) regression and genomic-relatedness-matrix (GRM) restricted maximum likelihood infer parameters such as SNP-based heritability from individual-level data directly. Therefore, these two types of methods are typically considered to be profoundly different. Nevertheless, recent work has revealed that LDSC and HE regression yield near-identical SNP-based heritability estimates when confounding stratification is absent. We now extend the equivalence; under the stratification assumed by LDSC regression, we show that the intercept can be estimated from individual-level data by transforming the coefficients of a regression of the phenotype on the leading principal components from the GRM. Using simulations, considering various degrees and forms of population stratification, we find that intercept estimates obtained from individual-level data are nearly equivalent to estimates from LDSC regression (R2> 99%). An empirical application corroborates these findings. Hence, LDSC regression is not profoundly different from methods using individual-level data; parameters that are identified by LDSC regression are also identified by methods using individual-level data. In addition, our results indicate that, under strong stratification, there is misattribution of stratification to the slope of LDSC regression, inflating estimates of SNP-based heritability from LDSC regression ceteris paribus. Hence, the intercept is not a panacea for population stratification. Consequently, LDSC-regression estimates should be interpreted with caution, especially when the intercept estimate is significantly greater than one.


2017 ◽  
Vol 20 (3) ◽  
pp. 257-259 ◽  
Author(s):  
Julian Hecker ◽  
Anna Maaser ◽  
Dmitry Prokopenko ◽  
Heide Loehlein Fier ◽  
Christoph Lange

VEGAS (versatile gene-based association study) is a popular methodological framework to perform gene-based tests based on summary statistics from single-variant analyses. The approach incorporates linkage disequilibrium information from reference panels to account for the correlation of test statistics. The gene-based test can utilize three different types of tests. In 2015, the improved framework VEGAS2, using more detailed reference panels, was published. Both versions provide user-friendly web- and offline-based tools for the analysis. However, the implementation of the popular top-percentage test is erroneous in both versions. The p values provided by VEGAS2 are deflated/anti-conservative. Based on real data examples, we demonstrate that this can increase substantially the rate of false-positive findings and can lead to inconsistencies between different test options. We also provide code that allows the user of VEGAS to compute correct p values.


2001 ◽  
Vol 58 (7) ◽  
pp. 1464-1476 ◽  
Author(s):  
Ransom A Myers ◽  
Brian R MacKenzie ◽  
Keith G Bowen ◽  
Nicholas J Barrowman

Population and community data in one study are usually analyzed in isolation from other data. Here, we introduce statistical methods that allow many data sets to be analyzed simultaneously such that different studies may "borrow strength" from each other. In the simplest case, we simultaneously model 21 Atlanic cod (Gadus morhua) stocks in the North Atlantic assuming that the maximum reproductive rate and the carrying capacity per unit area are random variables. This method uses a nonlinear mixed model and is a natural approach to investigate how carrying capacity varies among populations. We used empirical Bayes techniques to estimate the maximum reproductive rate and carrying capacity of each stock. In all cases, the empirical Bayes estimates were biologically reasonable, whereas a stock by stock analysis occasionally yielded nonsensical parameter estimates (e.g., infinite values). Our analysis showed that the carrying capacity per unit area varied by more than 20-fold among populations and that much of this variation was related to temperature. That is, the carrying capacity per square kilometre declines as temperature increases.


Author(s):  
Rui Fang ◽  
Brandie Wagner ◽  
J. Kirk Harris ◽  
Sophie A Fillon

Identification of the majority of organisms present in human-associated microbial communities is feasible with the advent of high throughput sequencing technology. However, these data consist of non-negative, highly skewed sequence counts with a large proportion of zeros. Zero-inflated models are useful for analyzing such data. Moreover, the non-zero observations may be over-dispersed in relation to the Poisson distribution, biasing parameter estimates and underestimating standard errors. In such a circumstance, a zero-inflated negative binomial (ZINB) model better accounts for these characteristics compared to a zero-inflated Poisson (ZIP). In addition, complex study designs are possible with repeated measurements or multiple samples collected from the same subject, thus random effects are introduced to account for the within subject variation. A zero-inflated negative binomial mixed model contains components to model the probability of excess zero values and the negative binomial parameters, allowing for repeated measures using independent random effects between these two components. The objective of this study is to examine the application of a zero-inflated negative binomial mixed model to human microbiota sequence data.


2019 ◽  
Author(s):  
Yi Yang ◽  
Xingjie Shi ◽  
Yuling Jiao ◽  
Jian Huang ◽  
Min Chen ◽  
...  

AbstractMotivationAlthough genome-wide association studies (GWAS) have deepened our understanding of the genetic architecture of complex traits, the mechanistic links that underlie how genetic variants cause complex traits remains elusive. To advance our understanding of the underlying mechanistic links, various consortia have collected a vast volume of genomic data that enable us to investigate the role that genetic variants play in gene expression regulation. Recently, a collaborative mixed model (CoMM) [42] was proposed to jointly interrogate genome on complex traits by integrating both the GWAS dataset and the expression quantitative trait loci (eQTL) dataset. Although CoMM is a powerful approach that leverages regulatory information while accounting for the uncertainty in using an eQTL dataset, it requires individual-level GWAS data and cannot fully make use of widely available GWAS summary statistics. Therefore, statistically efficient methods that leverages transcriptome information using only summary statistics information from GWAS data are required.ResultsIn this study, we propose a novel probabilistic model, CoMM-S2, to examine the mechanistic role that genetic variants play, by using only GWAS summary statistics instead of individual-level GWAS data. Similar to CoMM which uses individual-level GWAS data, CoMM-S2 combines two models: the first model examines the relationship between gene expression and genotype, while the second model examines the relationship between the phenotype and the predicted gene expression from the first model. Distinct from CoMM, CoMM-S2 requires only GWAS summary statistics. Using both simulation studies and real data analysis, we demonstrate that even though CoMM-S2 utilizes GWAS summary statistics, it has comparable performance as CoMM, which uses individual-level GWAS [email protected] and implementationThe implement of CoMM-S2 is included in the CoMM package that can be downloaded from https://github.com/gordonliu810822/CoMM.Supplementary informationSupplementary data are available at Bioinformatics online.


2021 ◽  
Vol 4 (1) ◽  
pp. 87-95
Author(s):  
Nirajan Bhattarai

Present study was carried out mainly aiming at studying the effect of non-genetic factors on prolificacy and pre-weaning kid mortality of Khari goats in Nawalpur, Nepal. The traits were recorded for 1005 does were measured and analyzed using fixed effect Least Square Mixed Model and Maximum Likelihood Computer Program (LSMMML PC-2). Results revealed that overall mean prolificacy and pre-weaning kid mortality in this study were 145 and 6.2%, respectively. According to the results, non-genetic factors such as altitude, coat color and dam’s parity were the important sources of variation with respect to pre-weaning kid mortality and prolificacy of Khari goats in this study. Thus, the results of present study suggested the scope of improvement in prolificacy and pre-weaning kid mortality through selective breeding.


Sign in / Sign up

Export Citation Format

Share Document