Weak-Instrument Robust Tests in Two-Sample Summary-Data Mendelian Randomization

Mapping Intimacies ◽

10.1101/769562 ◽

2019 ◽

Cited By ~ 2

Author(s):

Sheng Wang ◽

Hyunseung Kang

Keyword(s):

Error Control ◽

Type I Error ◽

Mendelian Randomization ◽

Likelihood Ratio Tests ◽

Type I ◽

Conditional Likelihood ◽

Test Statistics ◽

Weak Instruments ◽

Robust Tests ◽

Summary Data

AbstractMendelian randomization (MR) is a popular method in genetic epidemiology to estimate the effect of an exposure on an outcome using genetic variants as instrumental variables (IV), with two-sample summary-data MR being the most popular due to privacy. Unfortunately, many MR methods for two-sample summary data are not robust to weak instruments, a common phenomena with genetic instruments; many of these methods are biased and no existing MR method has Type I error control under weak instruments. In this work, we propose test statistics that are robust to weak instruments by extending Anderson-Rubin, Kleibergen, and conditional likelihood ratio tests in econometrics to the two-sample summary data setting. We conclude with a simulation and an empirical study and show that the proposed tests control size and have better power than current methods.

Download Full-text

Improving the accuracy of two-sample summary data Mendelian randomization: moving beyond the NOME assumption

10.1101/159442 ◽

2017 ◽

Cited By ~ 17

Author(s):

Jack Bowden ◽

Fabiola Del Greco M ◽

Cosetta Minelli ◽

Qingyuan Zhao ◽

Debbie A Lawlor ◽

...

Keyword(s):

Genetic Variants ◽

Type I Error ◽

Mendelian Randomization ◽

Disease Risk ◽

Causal Effect ◽

Meta Analysis ◽

Type I ◽

Weak Instruments ◽

Using Data ◽

Summary Data

AbstractBackgroundTwo-sample summary data Mendelian randomization (MR) incorporating multiple genetic variants within a meta-analysis framework is a popular technique for assessing causality in epidemiology. If all genetic variants satisfy the instrumental variable (IV) and necessary modelling assumptions, then their individual ratio estimates of causal effect should be homogeneous. Observed heterogeneity signals that one or more of these assumptions could have been violated.MethodsCausal estimation and heterogeneity assessment in MR requires an approximation for the variance, or equivalently the inverse-variance weight, of each ratio estimate. We show that the most popular ‘1st order’ weights can lead to an inflation in the chances of detecting heterogeneity when in fact it is not present. Conversely, ostensibly more accurate ‘2nd order’ weights can dramatically increase the chances of failing to detect heterogeneity, when it is truly present. We derive modified weights to mitigate both of these adverse effects.ResultsUsing Monte Carlo simulations, we show that the modified weights outperform 1st and 2nd order weights in terms of heterogeneity quantification. Modified weights are also shown to remove the phenomenon of regression dilution bias in MR estimates obtained from weak instruments, unlike those obtained using 1st and 2nd order weights. However, with small numbers of weak instruments, this comes at the cost of a reduction in estimate precision and power to detect a causal effect compared to 1st order weighting. Moreover, 1st order weights always furnish unbiased estimates and preserve the type I error rate under the causal null. We illustrate the utility of the new method using data from a recent two-sample summary data MR analysis to assess the causal role of systolic blood pressure on coronary heart disease risk.ConclusionsWe propose the use of modified weights within two-sample summary data MR studies for accurately quantifying heterogeneity and detecting outliers in the presence of weak instruments. Modified weights also have an important role to play in terms of causal estimation (in tandem with 1st order weights) but further research is required to understand their strengths and weaknesses in specific settings.

Download Full-text

Efficient Noninferiority Testing Procedures for Simultaneously Assessing Sensitivity and Specificity of Two Diagnostic Tests

Computational and Mathematical Methods in Medicine ◽

10.1155/2015/128930 ◽

2015 ◽

Vol 2015 ◽

pp. 1-7 ◽

Cited By ~ 1

Author(s):

Guogen Shan ◽

Amei Amei ◽

Daniel Young

Keyword(s):

Sensitivity And Specificity ◽

Diagnostic Tests ◽

Error Control ◽

Type I Error ◽

Binary Outcomes ◽

Type I ◽

Test Statistics ◽

Asymptotic Approach ◽

Testing Procedures ◽

Assessing Sensitivity

Sensitivity and specificity are often used to assess the performance of a diagnostic test with binary outcomes. Wald-type test statistics have been proposed for testing sensitivity and specificity individually. In the presence of a gold standard, simultaneous comparison between two diagnostic tests for noninferiority of sensitivity and specificity based on an asymptotic approach has been studied by Chen et al. (2003). However, the asymptotic approach may suffer from unsatisfactory type I error control as observed from many studies, especially in small to medium sample settings. In this paper, we compare three unconditional approaches for simultaneously testing sensitivity and specificity. They are approaches based on estimation, maximization, and a combination of estimation and maximization. Although the estimation approach does not guarantee type I error, it has satisfactory performance with regard to type I error control. The other two unconditional approaches are exact. The approach based on estimation and maximization is generally more powerful than the approach based on maximization.

Download Full-text

Combining the strengths of inverse-variance weighting and Egger regression in Mendelian randomization using a mixture of regressions model

PLoS Genetics ◽

10.1371/journal.pgen.1009922 ◽

2021 ◽

Vol 17 (11) ◽

pp. e1009922

Author(s):

Zhaotong Lin ◽

Yangqing Deng ◽

Wei Pan

Keyword(s):

Large Scale ◽

Type I Error ◽

Mendelian Randomization ◽

Meta Analysis ◽

Type I ◽

Perturbation Scheme ◽

Analysis Model ◽

Component Mixture ◽

Inverse Variance ◽

Summary Data

With the increasing availability of large-scale GWAS summary data on various traits, Mendelian randomization (MR) has become commonly used to infer causality between a pair of traits, an exposure and an outcome. It depends on using genetic variants, typically SNPs, as instrumental variables (IVs). The inverse-variance weighted (IVW) method (with a fixed-effect meta-analysis model) is most powerful when all IVs are valid; however, when horizontal pleiotropy is present, it may lead to biased inference. On the other hand, Egger regression is one of the most widely used methods robust to (uncorrelated) pleiotropy, but it suffers from loss of power. We propose a two-component mixture of regressions to combine and thus take advantage of both IVW and Egger regression; it is often both more efficient (i.e. higher powered) and more robust to pleiotropy (i.e. controlling type I error) than either IVW or Egger regression alone by accounting for both valid and invalid IVs respectively. We propose a model averaging approach and a novel data perturbation scheme to account for uncertainties in model/IV selection, leading to more robust statistical inference for finite samples. Through extensive simulations and applications to the GWAS summary data of 48 risk factor-disease pairs and 63 genetically uncorrelated trait pairs, we showcase that our proposed methods could often control type I error better while achieving much higher power than IVW and Egger regression (and sometimes than several other new/popular MR methods). We expect that our proposed methods will be a useful addition to the toolbox of Mendelian randomization for causal inference.

Download Full-text

MR-Corr2: a two-sample Mendelian randomization method that accounts for correlated horizontal pleiotropy using correlated instrumental variants

Bioinformatics ◽

10.1093/bioinformatics/btab646 ◽

2021 ◽

Author(s):

Qing Cheng ◽

Tingting Qiu ◽

Xiaoran Chai ◽

Baoluo Sun ◽

Yingcun Xia ◽

...

Keyword(s):

Complex Traits ◽

Error Control ◽

Type I Error ◽

Mendelian Randomization ◽

Association Studies ◽

Bivariate Normal Distribution ◽

Supplementary Information ◽

Type I ◽

Genome Wide Association Studies ◽

The Impact

Abstract Motivation Mendelian randomization (MR) is a valuable tool to examine the causal relationships between health risk factors and outcomes from observational studies. Along with the proliferation of genome-wide association studies, a variety of two-sample MR methods for summary data have been developed to account for horizontal pleiotropy (HP), primarily based on the assumption that the effects of variants on exposure (γ) and HP (α) are independent. In practice, this assumption is too strict and can be easily violated because of the correlated HP. Results To account for this correlated HP, we propose a Bayesian approach, MR-Corr2, that uses the orthogonal projection to reparameterize the bivariate normal distribution for γ and α, and a spike-slab prior to mitigate the impact of correlated HP. We have also developed an efficient algorithm with paralleled Gibbs sampling. To demonstrate the advantages of MR-Corr2 over existing methods, we conducted comprehensive simulation studies to compare for both type-I error control and point estimates in various scenarios. By applying MR-Corr2 to study the relationships between exposure–outcome pairs in complex traits, we did not identify the contradictory causal relationship between HDL-c and CAD. Moreover, the results provide a new perspective of the causal network among complex traits. Availability and implementation The developed R package and code to reproduce all the results are available at https://github.com/QingCheng0218/MR.Corr2. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Reciprocal causation mixture model for robust mendelian randomization analysis using genome-scale summary data

10.21203/rs.3.rs-719945/v1 ◽

2021 ◽

Author(s):

Zipeng Liu ◽

Yiming Qin ◽

Tian Wu ◽

Justin Tubbs ◽

Larry Baum ◽

...

Keyword(s):

Mixture Model ◽

Type I Error ◽

Mendelian Randomization ◽

Error Rates ◽

Type I ◽

Summary Statistics ◽

Reciprocal Causation ◽

Novel Strategy ◽

Genome Scale ◽

Summary Data

Abstract Mendelian randomization (MR) using GWAS summary statistics has become a popular method to infer causal relationships across complex diseases. However, the widespread pleiotropy observed in GWAS has made the selection of valid instrumental variables (IVs) problematic, leading to possible violations of MR assumptions and thus potentially invalid inferences concerning causation. Furthermore, current MR methods can examine causation in only one direction, so that two separate analyses are required for bi-directional analysis. In this study, we propose a novel strategy, MRCI (Mixture model Reciprocal Causation Inference), to estimate reciprocal causation between two phenotypes simultaneously using the genome-scale summary statistics of the two phenotypes and reference linkage disequilibrium (LD) information. Simulation studies, including strong correlated pleiotropy, showed that MRCI obtained nearly unbiased estimates of causation in both directions, and correct Type I error rates under the null hypothesis. In applications to real GWAS data, MRCI detected significant bi-directional and uni-directional causal influences between common diseases and putative risk factors.

Download Full-text

Improving the accuracy of two-sample summary-data Mendelian randomization: moving beyond the NOME assumption

International Journal of Epidemiology ◽

10.1093/ije/dyy258 ◽

2018 ◽

Vol 48 (3) ◽

pp. 728-742 ◽

Cited By ~ 42

Author(s):

Jack Bowden ◽

Fabiola Del Greco M ◽

Cosetta Minelli ◽

Qingyuan Zhao ◽

Debbie A Lawlor ◽

...

Keyword(s):

Genetic Variants ◽

Mendelian Randomization ◽

Disease Risk ◽

Causal Effect ◽

Meta Analysis ◽

Second Order ◽

Type I ◽

Weak Instruments ◽

First Order ◽

Summary Data

Abstract Background Two-sample summary-data Mendelian randomization (MR) incorporating multiple genetic variants within a meta-analysis framework is a popular technique for assessing causality in epidemiology. If all genetic variants satisfy the instrumental variable (IV) and necessary modelling assumptions, then their individual ratio estimates of causal effect should be homogeneous. Observed heterogeneity signals that one or more of these assumptions could have been violated. Methods Causal estimation and heterogeneity assessment in MR require an approximation for the variance, or equivalently the inverse-variance weight, of each ratio estimate. We show that the most popular ‘first-order’ weights can lead to an inflation in the chances of detecting heterogeneity when in fact it is not present. Conversely, ostensibly more accurate ‘second-order’ weights can dramatically increase the chances of failing to detect heterogeneity when it is truly present. We derive modified weights to mitigate both of these adverse effects. Results Using Monte Carlo simulations, we show that the modified weights outperform first- and second-order weights in terms of heterogeneity quantification. Modified weights are also shown to remove the phenomenon of regression dilution bias in MR estimates obtained from weak instruments, unlike those obtained using first- and second-order weights. However, with small numbers of weak instruments, this comes at the cost of a reduction in estimate precision and power to detect a causal effect compared with first-order weighting. Moreover, first-order weights always furnish unbiased estimates and preserve the type I error rate under the causal null. We illustrate the utility of the new method using data from a recent two-sample summary-data MR analysis to assess the causal role of systolic blood pressure on coronary heart disease risk. Conclusions We propose the use of modified weights within two-sample summary-data MR studies for accurately quantifying heterogeneity and detecting outliers in the presence of weak instruments. Modified weights also have an important role to play in terms of causal estimation (in tandem with first-order weights) but further research is required to understand their strengths and weaknesses in specific settings.

Download Full-text

Comparison of Means: An F Test Followed by a Modified Multiple Range Procedure

Journal of Educational Statistics ◽

10.3102/10769986004001014 ◽

1979 ◽

Vol 4 (1) ◽

pp. 14-23 ◽

Cited By ~ 9

Author(s):

Juliet Popper Shaffer

Keyword(s):

Error Control ◽

Type I Error ◽

Critical Value ◽

The Other ◽

Type I ◽

F Test ◽

Range Test

If used only when a preliminary F test yields significance, the usual multiple range procedures can be modified to increase the probability of detecting differences without changing the control of Type I error. The modification consists of a reduction in the critical value when comparing the largest and smallest means. Equivalence of modified and unmodified procedures in error control is demonstrated. The modified procedure is also compared with the alternative of using the unmodified range test without a preliminary F test, and it is shown that each has advantages over the other under some circumstances.

Download Full-text

Statistical inference of genetic pathway analysis in high dimensions

Biometrika ◽

10.1093/biomet/asz033 ◽

2019 ◽

Vol 106 (3) ◽

pp. 651-651

Author(s):

Yang Liu ◽

Wei Sun ◽

Alexander P Reiner ◽

Charles Kooperberg ◽

Qianchuan He

Keyword(s):

Statistical Inference ◽

Pathway Analysis ◽

Genetic Variants ◽

Error Control ◽

Genome Wide Association Study ◽

Type I Error ◽

High Density Lipoproteins ◽

Type I ◽

Genetic Pathway ◽

A Genome

Summary Genetic pathway analysis has become an important tool for investigating the association between a group of genetic variants and traits. With dense genotyping and extensive imputation, the number of genetic variants in biological pathways has increased considerably and sometimes exceeds the sample size $n$. Conducting genetic pathway analysis and statistical inference in such settings is challenging. We introduce an approach that can handle pathways whose dimension $p$ could be greater than $n$. Our method can be used to detect pathways that have nonsparse weak signals, as well as pathways that have sparse but stronger signals. We establish the asymptotic distribution for the proposed statistic and conduct theoretical analysis on its power. Simulation studies show that our test has correct Type I error control and is more powerful than existing approaches. An application to a genome-wide association study of high-density lipoproteins demonstrates the proposed approach.

Download Full-text

Type I error control in biomarker-stratified clinical trials

Trials ◽

10.1186/1745-6215-16-s2-o84 ◽

2015 ◽

Vol 16 (S2) ◽

Author(s):

Deepak Parashar ◽

Jack Bowden ◽

Colin Starr ◽

Lorenz Wernisch ◽

Adrian Mander

Keyword(s):

Clinical Trials ◽

Error Control ◽

Type I Error ◽

Type I

Download Full-text

No counts, no variance: allowing for loss of degrees of freedom when assessing biological variability from RNA-seq data

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2017-0010 ◽

2017 ◽

Vol 16 (2) ◽

Cited By ~ 1

Author(s):

Aaron T. L. Lun ◽

Gordon K. Smyth

Keyword(s):

Software Package ◽

Error Control ◽

Degrees Of Freedom ◽

Linear Models ◽

Type I Error ◽

Real Data ◽

Type I ◽

Rna Seq ◽

Study Gene Expression ◽

Complex Models

AbstractRNA sequencing (RNA-seq) is widely used to study gene expression changes associated with treatments or biological conditions. Many popular methods for detecting differential expression (DE) from RNA-seq data use generalized linear models (GLMs) fitted to the read counts across independent replicate samples for each gene. This article shows that the standard formula for the residual degrees of freedom (d.f.) in a linear model is overstated when the model contains fitted values that are exactly zero. Such fitted values occur whenever all the counts in a treatment group are zero as well as in more complex models such as those involving paired comparisons. This misspecification results in underestimation of the genewise variances and loss of type I error control. This article proposes a formula for the reduced residual d.f. that restores error control in simulated RNA-seq data and improves detection of DE genes in a real data analysis. The new approach is implemented in the quasi-likelihood framework of the edgeR software package. The results of this article also apply to RNA-seq analyses that apply linear models to log-transformed counts, such as those in the limma software package, and more generally to any count-based GLM where exactly zero fitted values are possible.

Download Full-text