scholarly journals Beyond the traditional simulation design for evaluating type 1 error rate: from ‘theoretical’ to ‘empirical’ null

2018 ◽  
Author(s):  
Ting Zhang ◽  
Lei Sun

AbstractWhen evaluating a newly developed statistical test, the first step is to check its type 1 error (TIE) control using simulations. This is often achieved by the standard simulation design S0 under the so-called ‘theoretical’ null of no association. In practice, whole-genome association analyses scan through a large number of genetic markers (Gs) for the ones associated with an outcome of interest (Y), where Y comes from an unknown alternative while the majority of Gs are not associated with Y, that is under the ‘empirical’ null. This reality can be better represented by two other simulation designs, where design S1.1 simulates Y from an alternative model based on G then evaluates its association with independently generated Gnew, while design S1.2 evaluates the association between permutated Yperm and G. More than a decade ago, Efron (2004) has noted the important distinction between the ‘theoretical’ and ‘empirical’ null in false discovery rate control. Using scale tests for variance heterogeneity and location tests of interaction effect as two examples, here we show that not all null simulation designs are equal. In examining the accuracy of a likelihood ratio test, while simulation design S0 shows the method has the correct T1E control, designs S1.1 and S1.2 suggest otherwise with empirical T1E values of 0.07 for the 0.05 nominal level. And the inflation becomes more severe at the tail and does not diminish as sample size increases. This is an important observation that calls for new practices for methods evaluation and interpretation of T1E control.

2011 ◽  
Vol 89 (4) ◽  
pp. 988-995 ◽  
Author(s):  
S. K. Onteru ◽  
B. Fan ◽  
M. T. Nikkilä ◽  
D. J. Garrick ◽  
K. J. Stalder ◽  
...  

Author(s):  
Shuo Jiao

This chapter presents set-based approaches that focus on identifying G X E interactions rather than set-based approaches that are based primarily on detecting G main effects (e.g., via marginal effects). The author reviews both his own research and the development of his Set Based Gene EnviRonment InterAction test (SBERIA), as well as another set-based G X E approach referred to as GESAT. GESAT extends the variance component test of the SNP-set Kernel Association Test (SKAT) to evaluate G x E effects while incorporating the main SNP effects as covariates. While both of these approaches (SBERIA and GESAT) have outperformed other benchmark methods (e.g., likelihood ratio test) and have been demonstrated to retain the appropriate Type 1 error rate, in this chapter the author conducts simulation studies to compare findings for SBERIA and GESAT approaches, and identifies associated strengths and limitations of the respective methods.


2021 ◽  
Vol 12 ◽  
Author(s):  
Brandon N. S. Ooi ◽  
Raechell ◽  
Ariel F. Ying ◽  
Yong Zher Koh ◽  
Yu Jin ◽  
...  

Background:Statins can cause muscle symptoms resulting in poor adherence to therapy and increased cardiovascular risk. We hypothesize that combinations of potentially functional SNPs (pfSNPs), rather than individual SNPs, better predict myalgia in patients on atorvastatin. This study assesses the value of potentially functional single nucleotide polymorphisms (pfSNPs) and employs six machine learning algorithms to identify the combination of SNPs that best predict myalgia.Methods: Whole genome sequencing of 183 Chinese, Malay and Indian patients from Singapore was conducted to identify genetic variants associated with atorvastatin induced myalgia. To adjust for confounding factors, demographic and clinical characteristics were also examined for their association with myalgia. The top factor, sex, was then used as a covariate in the whole genome association analyses. Variants that were highly associated with myalgia from this and previous studies were extracted, assessed for potential functionality (pfSNPs) and incorporated into six machine learning models. Predictive performance of a combination of different models and inputs were compared using the average cross validation area under ROC curve (AUC). The minimum combination of SNPs to achieve maximum sensitivity and specificity as determined by AUC, that predict atorvastatin-induced myalgia in most, if not all the six machine learning models was determined.Results: Through whole genome association analyses using sex as a covariate, a larger proportion of pfSNPs compared to non-pf SNPs were found to be highly associated with myalgia. Although none of the individual SNPs achieved genome wide significance in univariate analyses, machine learning models identified a combination of 15 SNPs that predict myalgia with good predictive performance (AUC >0.9). SNPs within genes identified in this study significantly outperformed SNPs within genes previously reported to be associated with myalgia. pfSNPs were found to be more robust in predicting myalgia, outperforming non-pf SNPs in the majority of machine learning models tested.Conclusion: Combinations of pfSNPs that were consistently identified by different machine learning models to have high predictive performance have good potential to be clinically useful for predicting atorvastatin-induced myalgia once validated against an independent cohort of patients.


2006 ◽  
Vol 38 (8) ◽  
pp. 888-895 ◽  
Author(s):  
Pengyuan Liu ◽  
Yian Wang ◽  
Haris Vikis ◽  
Anna Maciag ◽  
Daolong Wang ◽  
...  

2019 ◽  
Author(s):  
Brian E. Cade ◽  
Jiwon Lee ◽  
Tamar Sofer ◽  
Heming Wang ◽  
Man Zhang ◽  
...  

AbstractSleep-disordered breathing (SDB) is a common disorder associated with significant morbidity. Through the NHLBI Trans-Omics for Precision Medicine (TOPMed) program we report the first whole-genome sequence analysis of SDB. We identified 4 rare gene-based associations with SDB traits in 7,988 individuals of diverse ancestry and 4 replicated common variant associations with inclusion of additional samples (n=13,257). We identified a multi-ethnic set-based rare-variant association (p = 3.48 × 10−8) on chromosome X with ARMCX3. Transcription factor binding site enrichment identified associations with genes implicated with respiratory and craniofacial traits. Results highlighted associations in genes that modulate lung development, inflammation, respiratory rhythmogenesis and HIF1A-mediated hypoxic response.


Genetics ◽  
2020 ◽  
Vol 215 (4) ◽  
pp. 947-958 ◽  
Author(s):  
Chong Wu

Many genetic variants identified in genome-wide association studies (GWAS) are associated with multiple, sometimes seemingly unrelated, traits. This motivates multi-trait association analyses, which have successfully identified novel associated loci for many complex diseases. While appealing, most existing methods focus on analyzing a relatively small number of traits, and may yield inflated Type 1 error rates when a large number of traits need to be analyzed jointly. As deep phenotyping data are becoming rapidly available, we develop a novel method, referred to as aMAT (adaptive multi-trait association test), for multi-trait analysis of any number of traits. We applied aMAT to GWAS summary statistics for a set of 58 volumetric imaging derived phenotypes from the UK Biobank. aMAT had a genomic inflation factor of 1.04, indicating the Type 1 error rate was well controlled. More important, aMAT identified 24 distinct risk loci, 13 of which were ignored by standard GWAS. In comparison, the competing methods either had a suspicious genomic inflation factor or identified much fewer risk loci. Finally, four additional sets of traits have been analyzed and provided similar conclusions.


Sign in / Sign up

Export Citation Format

Share Document