A robust example of collider bias in a genetic association study

Mapping Intimacies ◽

10.1101/028035 ◽

2015 ◽

Author(s):

Felix Day ◽

Robert Scott ◽

Ken Ong ◽

John Perry

Keyword(s):

Genetic Association ◽

Association Studies ◽

Genetic Association Studies ◽

Positive Association ◽

Uk Biobank ◽

False Positive Association ◽

Genome Wide ◽

The Uk ◽

Study Inclusion ◽

Collider Bias

Recent studies have described the potential for ″collider bias″ to modify the magnitude of genotype-phenotype associations, however the extent to which this effect can induce a completely false-positive association remains unclear. In a sample of 142,630 individuals from the UK Biobank study, inclusion of height (a ″collider″) as a covariate induces biologically spurious, but genome-wide significant, associations between autosomal genetic variants and sex. These associations are non-significant in models unadjusted for height. Our study underpins the importance of causal inference modeling in the design and interpretation of genetic (and non-genetic) association studies.

Download Full-text

Global Biobank Engine: enabling genotype-phenotype browsing for biobank summary statistics

Bioinformatics ◽

10.1093/bioinformatics/bty999 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2495-2497 ◽

Cited By ~ 27

Author(s):

Gregory McInnes ◽

Yosuke Tanigawa ◽

Chris DeBoever ◽

Adam Lavertu ◽

Julia Eve Olivieri ◽

...

Keyword(s):

Association Studies ◽

Genetic Association Studies ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Uk Biobank ◽

Patient Privacy ◽

Web Based ◽

Genome Wide ◽

Wide Range ◽

The Uk

Abstract Summary Large biobanks linking phenotype to genotype have led to an explosion of genetic association studies across a wide range of phenotypes. Sharing the knowledge generated by these resources with the scientific community remains a challenge due to patient privacy and the vast amount of data. Here, we present Global Biobank Engine (GBE), a web-based tool that enables exploration of the relationship between genotype and phenotype in biobank cohorts, such as the UK Biobank. GBE supports browsing for results from genome-wide association studies, phenome-wide association studies, gene-based tests and genetic correlation between phenotypes. We envision GBE as a platform that facilitates the dissemination of summary statistics from biobanks to the scientific and clinical communities. Availability and implementation GBE currently hosts data from the UK Biobank and can be found freely available at biobankengine.stanford.edu.

Download Full-text

Assortative mating and within-spouse pair comparisons

PLoS Genetics ◽

10.1371/journal.pgen.1009883 ◽

2021 ◽

Vol 17 (11) ◽

pp. e1009883

Author(s):

Laurence J. Howe ◽

Thomas Battram ◽

Tim T. Morris ◽

Fernando P. Hartwig ◽

Gibran Hemani ◽

...

Keyword(s):

Educational Attainment ◽

Genetic Association ◽

Assortative Mating ◽

Association Studies ◽

Genetic Association Studies ◽

Uk Biobank ◽

Pair Comparisons ◽

Study Designs ◽

Random Pair ◽

Collider Bias

Spousal comparisons have been proposed as a design that can both reduce confounding and estimate effects of the shared adulthood environment. However, assortative mating, the process by which individuals select phenotypically (dis)similar mates, could distort associations when comparing spouses. We evaluated the use of spousal comparisons, as in the within-spouse pair (WSP) model, for aetiological research such as genetic association studies. We demonstrated that the WSP model can reduce confounding but may be susceptible to collider bias arising from conditioning on assorted spouse pairs. Analyses using UK Biobank spouse pairs found that WSP genetic association estimates were smaller than estimates from random pairs for height, educational attainment, and BMI variants. Within-sibling pair estimates, robust to demographic and parental effects, were also smaller than random pair estimates for height and educational attainment, but not for BMI. WSP models, like other within-family models, may reduce confounding from demographic factors in genetic association estimates, and so could be useful for triangulating evidence across study designs to assess the robustness of findings. However, WSP estimates should be interpreted with caution due to potential collider bias.

Download Full-text

Global Biobank Engine: enabling genotype-phenotype browsing for biobank summary statistics

10.1101/304188 ◽

2018 ◽

Cited By ~ 4

Author(s):

Gregory McInnes ◽

Yosuke Tanigawa ◽

Chris DeBoever ◽

Adam Lavertu ◽

Julia Eve Olivieri ◽

...

Keyword(s):

Association Studies ◽

Genetic Association Studies ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Uk Biobank ◽

Patient Privacy ◽

Web Based ◽

Genome Wide ◽

Wide Range ◽

The Uk

Large biobanks linking phenotype to genotype have led to an explosion of genetic association studies across a wide range of phenotypes. Sharing the knowledge generated by these resources with the scientific community remains a challenge due to patient privacy and the vast amount of data. Here we present Global Biobank Engine (GBE), a web-based tool that enables the exploration of the relationship between genotype and phenotype in large biobank cohorts, such as the UK Biobank. GBE supports browsing for results from genome-wide association studies, phenome-wide association studies, gene-based tests, and genetic correlation between phenotypes. We envision GBE as a platform that facilitates the dissemination of summary statistics from biobanks to the scientific and clinical communities. GBE currently hosts data from the UK Biobank and can be found freely available at biobankengine.stanford.edu.

Download Full-text

Evaluation of genome-wide power of genetic association studies based on empirical data from the HapMap project

Human Molecular Genetics ◽

10.1093/hmg/ddm205 ◽

2007 ◽

Vol 16 (20) ◽

pp. 2494-2505 ◽

Cited By ~ 23

Author(s):

Yasuhito Nannya ◽

Kenjiro Taura ◽

Mineo Kurokawa ◽

Shigeru Chiba ◽

Seishi Ogawa

Keyword(s):

Genetic Association ◽

Empirical Data ◽

Association Studies ◽

Genetic Association Studies ◽

Hapmap Project ◽

Genome Wide

Download Full-text

Reproducibility in the UK Biobank of Genome-Wide Significant Signals Discovered in Earlier Genome-wide Association Studies

10.1101/2020.06.24.20139576 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jack W. O’Sullivan ◽

John P. A. Ioannidis

Keyword(s):

Effect Size ◽

Association Studies ◽

Genome Wide Association ◽

P Value ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Single Nucleotide ◽

Genome Wide ◽

The Uk ◽

Open Question

AbstractWith the establishment of large biobanks, discovery of single nucleotide polymorphism (SNPs) that are associated with various phenotypes has been accelerated. An open question is whether SNPs identified with genome-wide significance in earlier genome-wide association studies (GWAS) are replicated also in later GWAS conducted in biobanks. To address this question, the authors examined a publicly available GWAS database and identified two, independent GWAS on the same phenotype (an earlier, “discovery” GWAS and a later, replication GWAS done in the UK biobank). The analysis evaluated 136,318,924 SNPs (of which 6,289 had reached p<5e-8 in the discovery GWAS) from 4,397,962 participants across nine phenotypes. The overall replication rate was 85.0% and it was lower for binary than for quantitative phenotypes (58.1% versus 94.8% respectively). There was a18.0% decrease in SNP effect size for binary phenotypes, but a 12.0% increase for quantitative phenotypes. Using the discovery SNP effect size, phenotype trait (binary or quantitative), and discovery p-value, we built and validated a model that predicted SNP replication with area under the Receiver Operator Curve = 0.90. While non-replication may often reflect lack of power rather than genuine false-positive findings, these results provide insights about which discovered associations are likely to be seen again across subsequent GWAS.

Download Full-text

Efficient estimation of disease odds ratios for follow-up genetic association studies

Statistical Methods in Medical Research ◽

10.1177/0962280217741771 ◽

2017 ◽

Vol 28 (7) ◽

pp. 1927-1941

Author(s):

Jiyuan Hu ◽

Wei Zhang ◽

Xinmin Li ◽

Dongdong Pan ◽

Qizhai Li

Keyword(s):

Genetic Association ◽

Association Studies ◽

Genetic Association Studies ◽

Efficient Estimation ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Odds Ratios ◽

Genome Wide ◽

Follow Up Studies

In the past decade, genome-wide association studies have identified thousands of susceptible variants associated with complex human diseases and traits. Conducting follow-up genetic association studies has become a standard approach to validate the findings of genome-wide association studies. One problem of high interest in genetic association studies is to accurately estimate the strength of the association, which is often quantified by odds ratios in case-control studies. However, estimating the association directly by follow-up studies is inefficient since this approach ignores information from the genome-wide association studies. In this article, an estimator called GFcom, which integrates information from genome-wide association studies and follow-up studies, is proposed. The estimator includes both the point estimate and corresponding confidence interval. GFcom is more efficient than competing estimators regarding MSE and the length of confidence intervals. The superiority of GFcom is particularly evident when the genome-wide association study suffers from severe selection bias. Comprehensive simulation studies and applications to three real follow-up studies demonstrate the performance of the proposed estimator. An R package, “GFcom”, implementing our method is publicly available at https://github.com/JiyuanHu/GFcom .

Download Full-text

The evolution of skin pigmentation associated variation in West Eurasia

10.1101/2020.05.08.085274 ◽

2020 ◽

Author(s):

Dan Ju ◽

Iain Mathieson

Keyword(s):

Genetic Variants ◽

Association Studies ◽

Skin Pigmentation ◽

Directional Selection ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Genome Wide ◽

Light Skin ◽

The Uk

AbstractSkin pigmentation is a classic example of a polygenic trait that has experienced directional selection in humans. Genome-wide association studies have identified well over a hundred pigmentation-associated loci, and genomic scans in present-day and ancient populations have identified selective sweeps for a small number of light pigmentation-associated alleles in Europeans. It is unclear whether selection has operated on all the genetic variation associated with skin pigmentation as opposed to just a small number of large-effect variants. Here, we address this question using ancient DNA from 1158 individuals from West Eurasia covering a period of 40,000 years combined with genome-wide association summary statistics from the UK Biobank. We find a robust signal of directional selection in ancient West Eurasians on skin pigmentation variants ascertained in the UK Biobank, but find this signal is driven mostly by a limited number of large-effect variants. Consistent with this observation, we find that a polygenic selection test in present-day populations fails to detect selection with the full set of variants; rather, only the top five show strong evidence of selection. Our data allow us to disentangle the effects of admixture and selection. Most notably, a large-effect variant at SLC24A5 was introduced to Europe by migrations of Neolithic farming populations but continued to be under selection post-admixture. This study shows that the response to selection for light skin pigmentation in West Eurasia was driven by a relatively small proportion of the variants that are associated with present-day phenotypic variation.SignificanceSome of the genes responsible for the evolution of light skin pigmentation in Europeans show signals of positive selection in present-day populations. Recently, genome-wide association studies have highlighted the highly polygenic nature of skin pigmentation. It is unclear whether selection has operated on all of these genetic variants or just a subset. By studying variation in over a thousand ancient genomes from West Eurasia covering 40,000 years we are able to study both the aggregate behavior of pigmentation-associated variants and the evolutionary history of individual variants. We find that the evolution of light skin pigmentation in Europeans was driven by frequency changes in a relatively small fraction of the genetic variants that are associated with variation in the trait today.

Download Full-text

Genome-wide association study of circulating liver enzymes reveals an expanded role for manganese transporter SLC30A10 in liver health

10.1101/2020.05.19.104570 ◽

2020 ◽

Author(s):

Lucas D. Ward ◽

Ho-Chou Tu ◽

Chelsea Quenneville ◽

Alexander O. Flynn-Carroll ◽

Margaret M. Parker ◽

...

Keyword(s):

Extrahepatic Bile Duct ◽

Association Studies ◽

Genome Wide Association ◽

Detectable Effect ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Extrahepatic Bile Duct Cancer ◽

Genome Wide ◽

Liver Health ◽

The Uk

AbstractTo better understand molecular pathways underlying liver health and disease, we performed genome-wide association studies (GWAS) on circulating levels of alanine aminotransferase (ALT) and aspartate aminotransferase (AST) across 408,300 subjects from four ethnic groups in the UK Biobank, focusing on variants associating with both enzymes. Of these variants, the strongest effect is a rare (MAF in White British = 0.12%) missense variant in the gene encoding manganese efflux transporter SLC30A10, Thr95Ile (rs188273166), associating with a 5.9% increase in ALT and a 4.2% increase in AST. Carriers have higher prevalence of all-cause liver disease (OR = 1.70; 95% CI = 1.24 to 2.34) and higher prevalence of extrahepatic bile duct cancer (OR = 23.8; 95% CI = 9.1 to 62.1) compared to non-carriers. Over 4% of the cases of extrahepatic cholangiocarcinoma in the UK Biobank carry SLC30A10 Thr95Ile. Unlike variants in SLC30A10 known to cause the recessive syndrome hypermanganesemia with dystonia-1 (HMNDYT1), the Thr95Ile variant has a detectable effect even in the heterozygous state. Also unlike HMNDYT1-causing variants, Thr95Ile results in a protein that is properly trafficked to the plasma membrane when expressed in HeLa cells. These results suggest that coding variation in SLC30A10 impacts liver health in more individuals than the small population of HMNDYT1 patients.

Download Full-text

Genome-Wide Control of Population Structure and Relatedness in Genetic Association Studies via Linear Mixed Models with Orthogonally Partitioned Structure

10.1101/409953 ◽

2018 ◽

Author(s):

Matthew P. Conomos ◽

Alex P. Reiner ◽

Mary Sara McPeek ◽

Timothy A. Thornton

Keyword(s):

Population Structure ◽

Genetic Association ◽

Mixed Models ◽

Association Studies ◽

Linear Mixed Models ◽

Genetic Association Studies ◽

European Ancestry ◽

Type I ◽

Genome Wide ◽

Wbc Count

AbstractLinear mixed models (LMMs) have become the standard approach for genetic association testing in the presence of sample structure. However, the performance of LMMs has primarily been evaluated in relatively homogeneous populations of European ancestry, despite many of the recent genetic association studies including samples from worldwide populations with diverse ancestries. In this paper, we demonstrate that existing LMM methods can have systematic miscalibration of association test statistics genome-wide in samples with heterogenous ancestry, resulting in both increased type-I error rates and a loss of power. Furthermore, we show that this miscalibration arises due to varying allele frequency differences across the genome among populations. To overcome this problem, we developed LMM-OPS, an LMM approach which orthogonally partitions diverse genetic structure into two components: distant population structure and recent genetic relatedness. In simulation studies with real and simulated genotype data, we demonstrate that LMM-OPS is appropriately calibrated in the presence of ancestry heterogeneity and outperforms existing LMM approaches, including EMMAX, GCTA, and GEMMA. We conduct a GWAS of white blood cell (WBC) count in an admixed sample of 3,551 Hispanic/Latino American women from the Women’s Health Initiative SNP Health Association Resource where LMM-OPS detects genome-wide significant associations with corresponding p-values that are one or more orders of magnitude smaller than those from competing LMM methods. We also identify a genome-wide significant association with regulatory variant rs2814778 in the DARC gene on chromosome 1, which generalizes to Hispanic/Latino Americans a previous association with reduced WBC count identified in African Americans.

Download Full-text

Body size and composition and site-specific cancers in UK Biobank: a Mendelian randomisation study

10.1101/2020.02.28.970459 ◽

2020 ◽

Cited By ~ 1

Author(s):

Mathew Vithayathil ◽

Paul Carter ◽

Siddhartha Kar ◽

Amy M. Mason ◽

Stephen Burgess ◽

...

Keyword(s):

Instrumental Variables ◽

Association Studies ◽

Genome Wide Association ◽

Mendelian Randomisation ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Site Specific ◽

Genome Wide ◽

Increased Risk ◽

The Uk

ABSTRACTObjectivesTo investigate the casual role of body mass index, body fat composition and height in cancer.DesignTwo stage mendelian randomisation studySettingPrevious genome wide association studies and the UK BiobankParticipantsGenetic instrumental variables for body mass index (BMI), fat mass index (FMI), fat free mass index (FFMI) and height from previous genome wide association studies and UK Biobank. Cancer outcomes from 367 586 participants of European descent from the UK Biobank.Main outcome measuresOverall cancer risk and 22 site-specific cancers risk for genetic instrumental variables for BMI, FMI, FFMI and height.ResultsGenetically predicted BMI (per 1 kg/m2) was not associated with overall cancer risk (OR 0.99; 95% confidence interval (CI) 0-98-1.00, p=0.105). Elevated BMI was associated with increased risk of stomach cancer (OR 1.15, 95% (CI) 1.05-1.26; p=0.003) and melanoma (OR 0.96, 95% CI 0.92-1.00; p=0.044). For sex-specific cancers, BMI was positively associated with uterine cancer (OR 1.08, 95% CI 1.01-1.14; p=0.015) but inversely associated with breast (OR 0.95, 95% CI 0.92-0.98; p=0.001), prostate (OR 0.95, 95% CI 0.92-0.99; p=0.007) and testicular cancer (OR 0.89, 95% CI 0.81-0.98; p=0.017). Elevated FMI (per 1 kg/m2) was associated with gastrointestinal cancer (stomach cancer OR 4.23, 95% CI 1.18-15.13, p=0.027; colorectal cancer OR 1.94, 95% CI 1.23-3.07; p=0.004). Increased height (per 1 standard deviation, approximately 6.5cm) was associated with increased risk of overall cancer (OR 1.06; 95% 1.04-1.09; p = 2.97×10-8) and most site-specific cancers with the strongest estimates for kidney, non-Hodgkin lymphoma, colorectal, lung, melanoma and breast cancer.ConclusionsThere is little evidence for BMI as a casual risk factor for cancer. BMI may have a causal role for sex-specific cancers, although with inconsistent directions of effect, and FMI for gastrointestinal malignancies. Elevated height is a risk factor for overall cancer and multiple site cancers.

Download Full-text