Beyond the traditional simulation design for evaluating type 1 error rate: from ‘theoretical’ to ‘empirical’ null

Mapping Intimacies ◽

10.1101/311290 ◽

2018 ◽

Author(s):

Ting Zhang ◽

Lei Sun

Keyword(s):

Ratio Test ◽

Simulation Design ◽

Important Distinction ◽

Nominal Level ◽

Association Analyses ◽

Type 1 Error ◽

Genome Association ◽

Whole Genome Association ◽

Control Designs

AbstractWhen evaluating a newly developed statistical test, the first step is to check its type 1 error (TIE) control using simulations. This is often achieved by the standard simulation design S0 under the so-called ‘theoretical’ null of no association. In practice, whole-genome association analyses scan through a large number of genetic markers (Gs) for the ones associated with an outcome of interest (Y), where Y comes from an unknown alternative while the majority of Gs are not associated with Y, that is under the ‘empirical’ null. This reality can be better represented by two other simulation designs, where design S1.1 simulates Y from an alternative model based on G then evaluates its association with independently generated Gnew, while design S1.2 evaluates the association between permutated Yperm and G. More than a decade ago, Efron (2004) has noted the important distinction between the ‘theoretical’ and ‘empirical’ null in false discovery rate control. Using scale tests for variance heterogeneity and location tests of interaction effect as two examples, here we show that not all null simulation designs are equal. In examining the accuracy of a likelihood ratio test, while simulation design S0 shows the method has the correct T1E control, designs S1.1 and S1.2 suggest otherwise with empirical T1E values of 0.07 for the 0.05 nominal level. And the inflation becomes more severe at the tail and does not diminish as sample size increases. This is an important observation that calls for new practices for methods evaluation and interpretation of T1E control.

Download Full-text

Whole-genome association analyses for lifetime reproductive traits in the pig

Journal of Animal Science ◽

10.2527/jas.2010-3236 ◽

2011 ◽

Vol 89 (4) ◽

pp. 988-995 ◽

Cited By ~ 54

Author(s):

S. K. Onteru ◽

B. Fan ◽

M. T. Nikkilä ◽

D. J. Garrick ◽

K. J. Stalder ◽

...

Keyword(s):

Reproductive Traits ◽

Whole Genome ◽

Association Analyses ◽

Genome Association ◽

Whole Genome Association

Download Full-text

Set-Based Gene × Environment Interaction Tests for Complex Diseases with Application to Genome-Wide Association and Sequencing Studies

Statistical Approaches to Gene X Environment Interactions for Complex Phenotypes ◽

10.7551/mitpress/9780262034685.003.0004 ◽

2016 ◽

Author(s):

Shuo Jiao

Keyword(s):

Ratio Test ◽

Environment Interaction ◽

Gene Environment Interaction ◽

Type 1 Error ◽

Gene Environment ◽

Genome Wide ◽

Component Test ◽

Sequencing Studies ◽

Main Effects

This chapter presents set-based approaches that focus on identifying G X E interactions rather than set-based approaches that are based primarily on detecting G main effects (e.g., via marginal effects). The author reviews both his own research and the development of his Set Based Gene EnviRonment InterAction test (SBERIA), as well as another set-based G X E approach referred to as GESAT. GESAT extends the variance component test of the SNP-set Kernel Association Test (SKAT) to evaluate G x E effects while incorporating the main SNP effects as covariates. While both of these approaches (SBERIA and GESAT) have outperformed other benchmark methods (e.g., likelihood ratio test) and have been demonstrated to retain the appropriate Type 1 error rate, in this chapter the author conducts simulation studies to compare findings for SBERIA and GESAT approaches, and identifies associated strengths and limitations of the respective methods.

Download Full-text

A Bayesian latent class analysis for whole-genome association analyses: an illustration using the GAW15 simulated rheumatoid arthritis dense scan data

BMC Proceedings ◽

10.1186/1753-6561-1-s1-s112 ◽

2007 ◽

Vol 1 (S1) ◽

Cited By ~ 7

Author(s):

Fredrick R Schumacher ◽

Peter Kraft

Keyword(s):

Rheumatoid Arthritis ◽

Latent Class Analysis ◽

Latent Class ◽

Whole Genome ◽

Class Analysis ◽

Association Analyses ◽

Genome Association ◽

Whole Genome Association ◽

Scan Data ◽

Bayesian Latent Class Analysis

Download Full-text

Robust Performance of Potentially Functional SNPs in Machine Learning Models for the Prediction of Atorvastatin-Induced Myalgia

Frontiers in Pharmacology ◽

10.3389/fphar.2021.605764 ◽

2021 ◽

Vol 12 ◽

Author(s):

Brandon N. S. Ooi ◽

Raechell ◽

Ariel F. Ying ◽

Yong Zher Koh ◽

Yu Jin ◽

...

Keyword(s):

Machine Learning ◽

Predictive Performance ◽

Whole Genome ◽

Learning Models ◽

Association Analyses ◽

Functional Snps ◽

Individual Snps ◽

Genome Association ◽

Whole Genome Association ◽

Machine Learning Models

Background:Statins can cause muscle symptoms resulting in poor adherence to therapy and increased cardiovascular risk. We hypothesize that combinations of potentially functional SNPs (pfSNPs), rather than individual SNPs, better predict myalgia in patients on atorvastatin. This study assesses the value of potentially functional single nucleotide polymorphisms (pfSNPs) and employs six machine learning algorithms to identify the combination of SNPs that best predict myalgia.Methods: Whole genome sequencing of 183 Chinese, Malay and Indian patients from Singapore was conducted to identify genetic variants associated with atorvastatin induced myalgia. To adjust for confounding factors, demographic and clinical characteristics were also examined for their association with myalgia. The top factor, sex, was then used as a covariate in the whole genome association analyses. Variants that were highly associated with myalgia from this and previous studies were extracted, assessed for potential functionality (pfSNPs) and incorporated into six machine learning models. Predictive performance of a combination of different models and inputs were compared using the average cross validation area under ROC curve (AUC). The minimum combination of SNPs to achieve maximum sensitivity and specificity as determined by AUC, that predict atorvastatin-induced myalgia in most, if not all the six machine learning models was determined.Results: Through whole genome association analyses using sex as a covariate, a larger proportion of pfSNPs compared to non-pf SNPs were found to be highly associated with myalgia. Although none of the individual SNPs achieved genome wide significance in univariate analyses, machine learning models identified a combination of 15 SNPs that predict myalgia with good predictive performance (AUC >0.9). SNPs within genes identified in this study significantly outperformed SNPs within genes previously reported to be associated with myalgia. pfSNPs were found to be more robust in predicting myalgia, outperforming non-pf SNPs in the majority of machine learning models tested.Conclusion: Combinations of pfSNPs that were consistently identified by different machine learning models to have high predictive performance have good potential to be clinically useful for predicting atorvastatin-induced myalgia once validated against an independent cohort of patients.

Download Full-text

Beyond the traditional simulation design for evaluating type 1 error control: From the “theoretical” null to “empirical” null

Genetic Epidemiology ◽

10.1002/gepi.22172 ◽

2018 ◽

Vol 43 (2) ◽

pp. 166-179

Author(s):

Ting Zhang ◽

Lei Sun

Keyword(s):

Error Control ◽

Simulation Design ◽

Type 1 Error

Download Full-text

Candidate lung tumor susceptibility genes identified through whole-genome association analyses in inbred mice

Nature Genetics ◽

10.1038/ng1849 ◽

2006 ◽

Vol 38 (8) ◽

pp. 888-895 ◽

Cited By ~ 55

Author(s):

Pengyuan Liu ◽

Yian Wang ◽

Haris Vikis ◽

Anna Maciag ◽

Daolong Wang ◽

...

Keyword(s):

Lung Tumor ◽

Inbred Mice ◽

Susceptibility Genes ◽

Whole Genome ◽

Association Analyses ◽

Tumor Susceptibility ◽

Genome Association ◽

Whole Genome Association

Download Full-text

Whole-Genome Association Analyses of Sleep-disordered Breathing Phenotypes in the NHLBI TOPMed Program

10.1101/652966 ◽

2019 ◽

Author(s):

Brian E. Cade ◽

Jiwon Lee ◽

Tamar Sofer ◽

Heming Wang ◽

Man Zhang ◽

...

Keyword(s):

Lung Development ◽

Sleep Disordered Breathing ◽

Whole Genome Sequence ◽

Whole Genome ◽

Association Analyses ◽

Respiratory Rhythmogenesis ◽

Genome Association ◽

Rare Gene ◽

Whole Genome Association ◽

Disordered Breathing

AbstractSleep-disordered breathing (SDB) is a common disorder associated with significant morbidity. Through the NHLBI Trans-Omics for Precision Medicine (TOPMed) program we report the first whole-genome sequence analysis of SDB. We identified 4 rare gene-based associations with SDB traits in 7,988 individuals of diverse ancestry and 4 replicated common variant associations with inclusion of additional samples (n=13,257). We identified a multi-ethnic set-based rare-variant association (p = 3.48 × 10−8) on chromosome X with ARMCX3. Transcription factor binding site enrichment identified associations with genes implicated with respiratory and craniofacial traits. Results highlighted associations in genes that modulate lung development, inflammation, respiratory rhythmogenesis and HIF1A-mediated hypoxic response.

Download Full-text

Multi-trait Genome-Wide Analyses of the Brain Imaging Phenotypes in UK Biobank

Genetics ◽

10.1534/genetics.120.303242 ◽

2020 ◽

Vol 215 (4) ◽

pp. 947-958 ◽

Cited By ~ 1

Author(s):

Chong Wu

Keyword(s):

Association Studies ◽

Error Rates ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Association Analyses ◽

Trait Association ◽

Type 1 Error ◽

Genome Wide ◽

Inflation Factor

Many genetic variants identified in genome-wide association studies (GWAS) are associated with multiple, sometimes seemingly unrelated, traits. This motivates multi-trait association analyses, which have successfully identified novel associated loci for many complex diseases. While appealing, most existing methods focus on analyzing a relatively small number of traits, and may yield inflated Type 1 error rates when a large number of traits need to be analyzed jointly. As deep phenotyping data are becoming rapidly available, we develop a novel method, referred to as aMAT (adaptive multi-trait association test), for multi-trait analysis of any number of traits. We applied aMAT to GWAS summary statistics for a set of 58 volumetric imaging derived phenotypes from the UK Biobank. aMAT had a genomic inflation factor of 1.04, indicating the Type 1 error rate was well controlled. More important, aMAT identified 24 distinct risk loci, 13 of which were ignored by standard GWAS. In comparison, the competing methods either had a suspicious genomic inflation factor or identified much fewer risk loci. Finally, four additional sets of traits have been analyzed and provided similar conclusions.

Download Full-text

Faculty Opinions recommendation of PLINK: a tool set for whole-genome association and population-based linkage analyses.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1162373.622875 ◽

2009 ◽

Cited By ~ 1

Author(s):

Alejandro Schaffer

Keyword(s):

Population Based ◽

Whole Genome ◽

Linkage Analyses ◽

Genome Association ◽

Whole Genome Association ◽

Tool Set

Download Full-text

Empirical Investigation of Type 1 Error Rate of Some Normality Test Statistics

International Journal of Psychosocial Rehabilitation ◽

10.37200/ijpr/v24i4/pr201037 ◽

2020 ◽

Vol 24 (04) ◽

pp. 591-599 ◽

Cited By ~ 1

Author(s):

John O Kuranga ◽

Kayode Ayinde ◽

Gbenga S. Solomon

Keyword(s):

Error Rate ◽

Empirical Investigation ◽

Test Statistics ◽

Type 1 Error ◽

Normality Test

Download Full-text