Fast and accurate long-range phasing in a UK Biobank cohort

Mapping Intimacies ◽

10.1101/028282 ◽

2015 ◽

Cited By ~ 5

Author(s):

Po-Ru Loh ◽

Pier Francesco Palamara ◽

Alkes L Price

Keyword(s):

Long Range ◽

Error Rate ◽

Rare Variants ◽

Imputation Accuracy ◽

Computational Cost ◽

Uk Biobank ◽

Identical By Descent ◽

Icelandic Population ◽

Related Individuals ◽

The Uk

Recent work has leveraged the extensive genotyping of the Icelandic population to perform long-range phasing (LRP), enabling accurate imputation and association analysis of rare variants in target samples typed on genotyping arrays. Here, we develop a fast and accurate LRP method, Eagle, that extends this paradigm to populations with much smaller proportions of genotyped samples by harnessing long (>4cM) identical-by-descent (IBD) tracts shared among distantly related individuals. We applied Eagle to N=150K samples (0.2% of the British population) from the UK Biobank, and we determined that it is 1-2 orders of magnitude faster than existing methods while achieving similar or better phasing accuracy (switch error rate ≈0.3%, corresponding to perfect phase in most 10Mb segments). We also observed that when used within an imputation pipeline, Eagle pre-phasing improved downstream imputation accuracy compared to pre-phasing in batches using existing methods (as necessary to achieve comparable computational cost).

Download Full-text

Assessing the contribution of rare-to-common protein-coding variants to circulating metabolic biomarker levels via 412,394 UK Biobank exome sequences

10.1101/2021.12.24.21268381 ◽

2021 ◽

Author(s):

Abhishek Nag ◽

Lawrence Middleton ◽

Ryan S Dhindsa ◽

Dimitrios Vitsios ◽

Eleanor M Wigmore ◽

...

Keyword(s):

Gene Networks ◽

Rare Variants ◽

Association Studies ◽

Low Frequency ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Protein Coding ◽

The Uk ◽

Metabolic Biomarkers ◽

Coding Variants

Genome-wide association studies have established the contribution of common and low frequency variants to metabolic biomarkers in the UK Biobank (UKB); however, the role of rare variants remains to be assessed systematically. We evaluated rare coding variants for 198 metabolic biomarkers, including metabolites assayed by Nightingale Health, using exome sequencing in participants from four genetically diverse ancestries in the UKB (N=412,394). Gene-level collapsing analysis, that evaluated a range of genetic architectures, identified a total of 1,303 significant relationships between genes and metabolic biomarkers (p<1x10-8), encompassing 207 distinct genes. These include associations between rare non-synonymous variants in GIGYF1 and glucose and lipid biomarkers, SYT7 and creatinine, and others, which may provide insights into novel disease biology. Comparing to a previous microarray-based genotyping study in the same cohort, we observed that 40% of gene-biomarker relationships identified in the collapsing analysis were novel. Finally, we applied Gene-SCOUT, a novel tool that utilises the gene-biomarker association statistics from the collapsing analysis to identify genes having similar biomarker fingerprints and thus expand our understanding of gene networks.

Download Full-text

Analysis of exome-sequenced UK Biobank subjects implicates genes affecting risk of hyperlipidaemia

10.1101/2020.07.09.20150334 ◽

2020 ◽

Author(s):

David Curtis

Keyword(s):

Rare Variants ◽

Sequence Data ◽

Lipid Lowering ◽

P Value ◽

Data Sets ◽

Uk Biobank ◽

Functional Studies ◽

Exome Sequence Data ◽

Rare Genetic Variants ◽

The Uk

Rare genetic variants in LDLR, APOB and PCSK9 are known causes of familial hypercholesterolaemia and it is expected that rare variants in other genes will also have effects on hyperlipidaemia risk although such genes remain to be identified. The UK Biobank consists of a sample of 500,000 volunteers and exome sequence data is available for 50,000 of them. 11,490 of these were classified as hyperlipidaemia cases on the basis of having a relevant diagnosis recorded and/or taking lipid-lowering medication while the remaining 38,463 were treated as controls. Variants in each gene were assigned weights according to rarity and predicted impact and overall weighted burden scores were compared between cases and controls, including population principal components as covariates. One biologically plausible gene, HUWE1, produced statistically significant evidence for association after correction for testing 22,028 genes with a signed log10 p value (SLP) of -6.15, suggesting a protective effect of variants in this gene. Other genes with uncorrected p<0.001 are arguably also of interest, including LDLR (SLP=3.67), RBP2 (SLP=3.14), NPFFR1 (SLP=3.02) and ACOT9 (SLP=-3.19). Gene set analysis indicated that rare variants in genes involved in metabolism and energy can influence hyperlipidaemia risk. Overall, the results provide some leads which might be followed up with functional studies and which could be tested in additional data sets as these become available. This research has been conducted using the UK Biobank Resource.

Download Full-text

Expanding cancer predisposition genes with ultra-rare cancer-exclusive human variations

10.1101/2020.01.09.19015867 ◽

2020 ◽

Author(s):

Roni Rasnic ◽

Nathan Linial ◽

Michal Linial

Keyword(s):

Genetic Predisposition ◽

Rare Variants ◽

Genetic Alterations ◽

Cancer Predisposition ◽

Rare Cancer ◽

Uk Biobank ◽

Independent Evidence ◽

High Penetrance ◽

Predisposition Genes ◽

The Uk

AbstractIt is estimated that up to 10% of cancer incidents are attributed to inherited genetic alterations. Despite extensive research, there are still gaps in our understanding of genetic predisposition to cancer. It was theorized that ultra-rare variants partially account for the missing heritable component. We harness the UK BioBank dataset of ∼500,000 individuals, 14% of which were diagnosed with cancer, to detect ultra-rare, possibly high-penetrance cancer predisposition variants. We report on 115 cancer-exclusive ultra-rare variations (CUVs) and nominate 26 variants with additional independent evidence as cancer predisposition variants. We conclude that population cohorts are valuable source for expanding the collection of novel cancer predisposition genes.

Download Full-text

Imputation of Behavioral Candidate Gene Repeat Polymorphisms in 486,551 Publicly-Available UK Biobank Individuals

10.1101/358267 ◽

2018 ◽

Author(s):

Richard Border ◽

Andrew Smolen ◽

Robin P. Corley ◽

Michael C. Stallings ◽

Sandra A. Brown ◽

...

Keyword(s):

Imputation Accuracy ◽

Variable Number Tandem Repeat ◽

Variable Number ◽

Uk Biobank ◽

Out Of Sample ◽

The Subject ◽

Polymorphism Data ◽

The Uk ◽

Broad Interest ◽

Vntr Polymorphisms

AbstractSome of the most widely studied polymorphisms in psychiatric genetics include variable number tandem repeat polymorphisms (VNTRs) in SLC6A3, DRD4, SLC6A4, and MAOA. While initial findings suggested large effects, their importance with respect to psychiatric phenotypes is the subject of much debate with broadly conflicting results. Despite broad interest, these loci remain absent from the largest available samples, such as the UK Biobank, limiting researchers’ ability to test these contentious hypotheses rigorously in large samples. Here, using two independent reference datasets, we report out-of-sample imputation accuracy estimates of >0.96 for all four VNTR polymorphisms and one modifying SNP, depending on the reference and target dataset. We describe the imputation procedures of these candidate polymorphisms in 486,551 UK Biobank individuals, and have made the imputed polymorphism data available to UK Biobank researchers. This resource, provided to the community, will allow the most rigorous tests to-date of the roles of these polymorphisms in behavioral and psychiatric phenotypes.

Download Full-text

IBDkin: fast estimation of kinship coefficients from identity by descent segments

Bioinformatics ◽

10.1093/bioinformatics/btaa569 ◽

2020 ◽

Vol 36 (16) ◽

pp. 4519-4520

Author(s):

Ying Zhou ◽

Sharon R Browning ◽

Brian L Browning

Keyword(s):

Software Package ◽

Large Datasets ◽

Supplementary Information ◽

Supplementary Data ◽

Uk Biobank ◽

Identity By Descent ◽

Fast Estimation ◽

Kinship Coefficients ◽

Related Individuals ◽

The Uk

Abstract Motivation Estimation of pairwise kinship coefficients in large datasets is computationally challenging because the number of related individuals increases quadratically with sample size. Results We present IBDkin, a software package written in C for estimating kinship coefficients from identity by descent (IBD) segments. We use IBDkin to estimate kinship coefficients for 7.95 billion pairs of individuals in the UK Biobank who share at least one detected IBD segment with length ≥ 4 cM. Availability and implementation https://github.com/YingZhou001/IBDkin. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Novel Approach for Parallelizing Pairwise Comparison Problems as Applied to Detecting Segments Identical By Decent in Whole-Genome Data

Bioinformatics ◽

10.1093/bioinformatics/btab084 ◽

2021 ◽

Author(s):

Emmanuel Sapin ◽

Matthew C Keller

Keyword(s):

Pairwise Comparison ◽

Pairwise Comparisons ◽

Uk Biobank ◽

Large Problem ◽

Genome Data ◽

Full Dataset ◽

Identical By Descent ◽

Novel Approach ◽

The Uk ◽

Massive Parallelization

Abstract Motivation Pairwise comparison problems arise in many areas of science. In genomics, datasets are already large and getting larger, and so operations that require pairwise comparisons—either on pairs of SNPs or pairs of individuals—are extremely computationally challenging. We propose a generic algorithm for addressing pairwise comparison problems that breaks a large problem (of order n2 comparisons) into multiple smaller ones (each of order n comparisons), allowing for massive parallelization. Results We demonstrated that this approach is very efficient for calling identical by descent (IBD) segments between all pairs of individuals in the UK Biobank dataset, with a 250-fold savings in time and 750-fold savings in memory over the standard approach to detecting such segments across the full dataset. This efficiency should extend to other methods of IBD calling and, more generally, to other pairwise comparison tasks in genomics or other areas of science.

Download Full-text

Surveying the contribution of rare variants to the genetic architecture of human disease through exome sequencing of 177,882 UK Biobank participants

10.1101/2020.12.13.422582 ◽

2020 ◽

Author(s):

Quanli Wang ◽

Ryan S. Dhindsa ◽

Keren Carss ◽

Andrew R Harper ◽

Abhishek Nag ◽

...

Keyword(s):

Exome Sequencing ◽

Drug Targets ◽

Rare Variants ◽

Population Based ◽

Uk Biobank ◽

Loss Of Function ◽

Sequencing Data ◽

Phenotypic Data ◽

Protein Coding ◽

The Uk

The UK Biobank (UKB) represents an unprecedented population-based study of 502,543 participants with detailed phenotypic data and linkage to medical records. While the release of genotyping array data for this cohort has bolstered genomic discovery for common variants, the contribution of rare variants to this broad phenotype collection remains relatively unknown. Here, we use exome sequencing data from 177,882 UKB participants to evaluate the association between rare protein-coding variants with 10,533 binary and 1,419 quantitative phenotypes. We performed both a variant-level phenome-wide association study (PheWAS) and a gene-level collapsing analysis-based PheWAS tailored to detecting the aggregate contribution of rare variants. The latter revealed 911 statistically significant gene-phenotype relationships, with a median odds ratio of 15.7 for binary traits. Among the binary trait associations identified using collapsing analysis, 83% were undetectable using single variant association tests, emphasizing the power of collapsing analysis to detect signal in the setting of high allelic heterogeneity. As a whole, these genotype-phenotype associations were significantly enriched for loss-of-function mediated traits and currently approved drug targets. Using these results, we summarise the contribution of rare variants to common diseases in the context of the UKB phenome and provide an example of how novel gene-phenotype associations can aid in therapeutic target prioritisation.

Download Full-text

Variants in ACE2 and TMPRSS2 genes are not major determinants of COVID-19 severity in UK Biobank subjects

10.1101/2020.05.01.20085860 ◽

2020 ◽

Cited By ~ 4

Author(s):

David Curtis

Keyword(s):

Rare Variants ◽

Sequence Data ◽

Severe Disease ◽

Major Effect ◽

Uk Biobank ◽

Exome Sequence Data ◽

Dna Sequence Variants ◽

The Uk ◽

And Function ◽

Exome Sequence

AbstractIt is plausible that variants in the ACE2 and TMPRSS2 genes might contribute to variation in COVID-19 severity and that these could explain why some people become very unwell whereas most do not. Exome sequence data was obtained for 49,953 UK Biobank subjects of whom 74 had tested positive for SARS-CoV-2 and could be presumed to have severe disease. A weighted burden analysis was carried out using SCOREASSOC to determine whether there were differences between these cases and the other sequenced subjects in the overall burden of rare, damaging variants in ACE2 or TMPRSS2. There were no statistically significant differences in weighted burden scores between cases and controls for either gene. There were no individual DNA sequence variants with a markedly different frequency between cases and controls. Whether there are small effects on severity, or whether there might be rare variants with major effect sizes, would require studies in much larger samples. Genetic variants affecting the structure and function of the ACE2 and TMPRSS2 proteins are not a major determinant of whether infection with SARS-CoV-2 results in severe symptoms. This research has been conducted using the UK Biobank Resource.

Download Full-text

Weighted burden analysis in 200 000 exome-sequenced UK Biobank subjects characterises effects of rare genetic variants on BMI

10.1101/2021.01.20.21250151 ◽

2021 ◽

Author(s):

David Curtis

Keyword(s):

Multiple Testing ◽

Rare Variants ◽

Sequence Data ◽

Statistical Significance ◽

Positive Sign ◽

P Value ◽

Uk Biobank ◽

Functional Variants ◽

Rare Genetic Variants ◽

The Uk

AbstractIntroductionA number of genes have been identified in which rare variants can cause obesity. Here we analyse a sample of exome sequenced subjects from UK Biobank using BMI as a phenotype.MethodsThere were 199,807 exome sequenced subjects for whom BMI was recorded. Weighted burden analysis of rare, functional variants was carried out, incorporating population principal components and sex as covariates. For selected genes, additional analyses were carried out to clarify the contribution of different categories of variant. Statistical significance was summarised as the signed log 10 of the p value (SLP), given a positive sign if the weighted burden score was positively correlated with BMI.ResultsTwo genes were exome-wide significant, MC4R (SLP = 15.79) and PCSK1 (SLP = 6.61). In MC4R, disruptive variants were associated with an increase in BMI of 2.72 units and probably damaging nonsynonymous variants with an increase of 2.02 units. In PCSK1, disruptive variants were associated with a BMI increase of 2.29 and protein-altering variants with an increase of 0.34. Results for other genes were not formally significant after correction for multiple testing, although SIRT1, ZBED6 and NPC2 were noted to be of potential interest.ConclusionBecause the UK Biobank consists of a self-selected sample of relatively healthy volunteers, the effect sizes noted may be underestimates. The results demonstrate the effects of very rare variants on BMI and suggest that other genes and variants will be definitively implicated when the sequence data for additional subjects becomes available.This research has been conducted using the UK Biobank Resource.

Download Full-text

Multiple linear regression allows weighted burden analysis of rare coding variants in an ethnically heterogeneous population

10.1101/2020.06.11.145938 ◽

2020 ◽

Cited By ~ 1

Author(s):

David Curtis

Keyword(s):

Linear Regression ◽

Principal Components ◽

Rare Variants ◽

Linear Regression Analysis ◽

Uk Biobank ◽

Case Control Studies ◽

Test Statistic ◽

Functional Variants ◽

The Uk ◽

Coding Variants

AbstractWeighted burden analysis has been used in exome-sequenced case-control studies to identify genes in which there is an excess of rare and/or functional variants associated with phenotype. Implementation in a ridge regression framework allows simultaneous analysis of all variants along with relevant covariates such as population principal components. In order to apply the approach to a quantitative phenotype, a weighted burden score is derived for each subject and included in a linear regression analysis. The weighting scheme is adjusted in order to apply differential weights to rare and very rare variants and a score is derived based on both the frequency and predicted effect of each variant. When applied to an ethnically heterogeneous dataset consisting of 49,790 exome-sequenced UK Biobank subjects and using BMI as the phenotype the method produces a very inflated test statistic. However this is almost completely corrected by including 20 population principal components as covariates. When this is done the top 30 genes include a few which are quite plausibly associated with the phenotype, including LYPLAL1 and NSDHL. This approach offers a way to carry out gene-based analyses of rare variants identified by exome sequencing in heterogeneous datasets without requiring that data from ethnic minority subjects be discarded. This research has been conducted using the UK Biobank Resource.

Download Full-text