scholarly journals Quantifying the unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects

2015 ◽  
Author(s):  
James Zou ◽  
Gregory Valiant ◽  
Paul Valiant ◽  
Konrad Karczewski ◽  
Siu On Chan ◽  
...  

As new proposals aim to sequence ever larger collection of humans, it is critical to have a quantitative framework to evaluate the statistical power of these projects. We developed a new algorithm, UnseenEst, and applied it to the exomes of 60,706 individuals to estimate the frequency distribution of all protein-coding variants, including rare variants that have not been observed yet in the current cohorts. Our results quantified the number of new variants that we expect to identify as sequencing cohorts reach hundreds of thousands of individuals. With 500K individuals, we find that we expect to capture 7.5% of all possible loss-of-function variants and 12% of all possible missense variants. We also estimate that 2,900 genes have loss-of-function frequency of less than 0.00001 in healthy humans, consistent with very strong intolerance to gene inactivation.

Author(s):  
Seung Hoan Choi ◽  
Sean J. Jurgens ◽  
Christopher M. Haggerty ◽  
Amelia W. Hall ◽  
Jennifer L. Halford ◽  
...  

Background - Alterations in electrocardiographic (ECG) intervals are well-known markers for arrhythmia and sudden cardiac death (SCD) risk. While the genetics of arrhythmia syndromes have been studied, relations between ECG intervals and rare genetic variation at a population level are poorly understood. Methods - Using a discovery sample of 29,000 individuals with whole-genome sequencing from TOPMed and replication in nearly 100,000 with whole-exome sequencing from the UK Biobank and MyCode, we examined associations between low-frequency and rare coding variants with 5 routinely measured ECG traits (RR, P-wave, PR, and QRS intervals and corrected QT interval [QTc]). Results - We found that rare variants associated with population-based ECG intervals identify established monogenic SCD genes ( KCNQ1 , KCNH2 , SCN5A ), a controversial monogenic SCD gene ( KCNE1 ), and novel genes ( PAM , MFGE8 ) involved in cardiac conduction. Loss-of-function and pathogenic SCN5A variants, carried by 0.1% of individuals, were associated with a nearly 6-fold increased odds of first-degree atrioventricular block ( P =8.4x10 -5 ). Similar variants in KCNQ1 and KCNH2 (0.2% of individuals) were associated with a 23-fold increased odds of marked QTc prolongation ( P =4x10 -25 ), a marker of SCD risk. Incomplete penetrance of such deleterious variation was common as over 70% of carriers had normal ECG intervals. Conclusions - Our findings indicate that large-scale high-depth sequence data and ECG analysis identifies monogenic arrhythmia susceptibility genes and rare variants with large effects. Known pathogenic variation in conventional arrhythmia and SCD genes exhibited incomplete penetrance and accounted for only a small fraction of marked ECG interval prolongation.


2016 ◽  
Vol 7 (1) ◽  
Author(s):  
James Zou ◽  
Gregory Valiant ◽  
Paul Valiant ◽  
Konrad Karczewski ◽  
Siu On Chan ◽  
...  

Author(s):  
Doris Škorić-Milosavljević ◽  
Najim Lahrouchi ◽  
Fernanda M. Bosada ◽  
Gregor Dombrowsky ◽  
Simon G. Williams ◽  
...  

Abstract Purpose Rare genetic variants in KDR, encoding the vascular endothelial growth factor receptor 2 (VEGFR2), have been reported in patients with tetralogy of Fallot (TOF). However, their role in disease causality and pathogenesis remains unclear. Methods We conducted exome sequencing in a familial case of TOF and large-scale genetic studies, including burden testing, in >1,500 patients with TOF. We studied gene-targeted mice and conducted cell-based assays to explore the role of KDR genetic variation in the etiology of TOF. Results Exome sequencing in a family with two siblings affected by TOF revealed biallelic missense variants in KDR. Studies in knock-in mice and in HEK 293T cells identified embryonic lethality for one variant when occurring in the homozygous state, and a significantly reduced VEGFR2 phosphorylation for both variants. Rare variant burden analysis conducted in a set of 1,569 patients of European descent with TOF identified a 46-fold enrichment of protein-truncating variants (PTVs) in TOF cases compared to controls (P = 7 × 10-11). Conclusion Rare KDR variants, in particular PTVs, strongly associate with TOF, likely in the setting of different inheritance patterns. Supported by genetic and in vivo and in vitro functional analysis, we propose loss-of-function of VEGFR2 as one of the mechanisms involved in the pathogenesis of TOF.


Author(s):  
Elisabeth Bosch ◽  
Moritz Hebebrand ◽  
Bernt Popp ◽  
Theresa Penger ◽  
Bettina Behring ◽  
...  

Abstract Context CPE encodes carboxypeptidase E, an enzyme which converts proneuropeptides and propeptide hormones to bioactive forms. It is widely expressed in the endocrine and central nervous system. To date, four individuals from two families with core clinical features including morbid obesity, neurodevelopmental delay and hypogonadotropic hypogonadism, harbouring biallelic loss-of-function CPE variants, were reported. Objective We describe four affected individuals from three unrelated consanguineous families, two siblings of Syrian, one of Egyptian and one of Pakistani descent, all harbouring novel homozygous CPE loss-of-function variants. Methods After excluding Prader-Willi syndrome, exome sequencing was performed in both Syrian siblings. The variants identified in the other two individuals were reported as research variants in a large scale exome study and in ClinVar database. Computational modelling of all possible missense alterations allowed assessing CPE tolerance to missense variants. Results All affected individuals were severely obese with neurodevelopmental delay and other endocrine anomalies. Three individuals from two families shared the same CPE homozygous truncating variant c.361C>T, p.(Arg121*), while the fourth carried the c.994del, p.(Ser333Alafs*22) variant. Comparison of clinical features with previously described cases and standardization according to the Human Phenotype Ontology indicated a recognisable clinical phenotype, which we termed Blakemore-Durmaz-Vasileiou (BDV) syndrome. Computational analysis indicated high conservation of CPE domains and intolerance to missense changes. Conclusions Biallelic truncating CPE variants are associated with BDV syndrome, a clinically recognisable monogenic recessive syndrome with childhood-onset obesity, neurodevelopmental delay, hypogonadotropic hypogonadism and hypothyroidism. BDV syndrome resembles Prader-Willi syndrome. Our findings suggested that missense variants may also be clinically relevant.


2021 ◽  
Author(s):  
Abhishek Nag ◽  
Lawrence Middleton ◽  
Ryan S Dhindsa ◽  
Dimitrios Vitsios ◽  
Eleanor M Wigmore ◽  
...  

Genome-wide association studies have established the contribution of common and low frequency variants to metabolic biomarkers in the UK Biobank (UKB); however, the role of rare variants remains to be assessed systematically. We evaluated rare coding variants for 198 metabolic biomarkers, including metabolites assayed by Nightingale Health, using exome sequencing in participants from four genetically diverse ancestries in the UKB (N=412,394). Gene-level collapsing analysis, that evaluated a range of genetic architectures, identified a total of 1,303 significant relationships between genes and metabolic biomarkers (p<1x10-8), encompassing 207 distinct genes. These include associations between rare non-synonymous variants in GIGYF1 and glucose and lipid biomarkers, SYT7 and creatinine, and others, which may provide insights into novel disease biology. Comparing to a previous microarray-based genotyping study in the same cohort, we observed that 40% of gene-biomarker relationships identified in the collapsing analysis were novel. Finally, we applied Gene-SCOUT, a novel tool that utilises the gene-biomarker association statistics from the collapsing analysis to identify genes having similar biomarker fingerprints and thus expand our understanding of gene networks.


2016 ◽  
Author(s):  
Antonio F Pardiñas ◽  
Peter Holmans ◽  
Andrew J Pocklington ◽  
Valentina Escott-Price ◽  
Stephan Ripke ◽  
...  

Schizophrenia is a debilitating psychiatric condition often associated with poor quality of life and decreased life expectancy. Lack of progress in improving treatment outcomes has been attributed to limited knowledge of the underlying biology, although large-scale genomic studies have begun to provide such insight. We report the largest single cohort genome-wide association study of schizophrenia (11,260 cases and 24,542 controls) and through meta-analysis with existing data we identify 50 novel GWAS loci. Using gene-wide association statistics we implicate an additional set of 22 novel associations that map onto a single gene. We show for the first time that the common variant association signal is highly enriched among genes that are intolerant to loss of function mutations and that variants in these genes persist in the population despite the low fecundity associated with the disorder through the process of background selection. Associations point to novel areas of biology (e.g. metabotropic GABA-B signalling and acetyl cholinesterase), reinforce those implicated in earlier GWAS studies (e.g. calcium channel function), converge with earlier rare variants studies (e.g. NRXN1, GABAergic signalling), identify novel overlaps with autism (e.g. RBFOX1, FOXP1, FOXG1), and support early controversial candidate gene hypotheses (e.g. ERBB4 implicating neuregulin signalling). We also demonstrate the involvement of six independent central nervous system functional gene sets in schizophrenia pathophysiology. These findings provide novel insights into the biology and genetic architecture of schizophrenia, highlight the importance of mutation intolerant genes and suggest a mechanism by which common risk variants are maintained in the population.


2021 ◽  
Author(s):  
Kavita Praveen ◽  
Lee Dobbyn ◽  
Lauren Gurski ◽  
Ariane H. Ayer ◽  
Jeffrey Staples ◽  
...  

ABSTRACTUnderstanding the genetic underpinnings of disabling hearing loss, which affects ∼466 million people worldwide, can provide avenues for new therapeutic target development. We performed a genome-wide association meta-analysis of hearing loss with 125,749 cases and 469,497 controls across five cohorts, including UK Biobank, Geisinger DiscovEHR, the Malmö Diet and Cancer Study, Mount Sinai’s BioMe Personalized Medicine Cohort, and FinnGen. We identified 53 loci affecting hearing loss risk, 15 of which are novel, including common coding variants in COL9A3 and TMPRSS3. Through exome-sequencing of 108,415 cases and 329,581 controls from the same cohorts, we identified hearing loss associations with burden of rare coding variants in FSCN2 (odds ratio [OR] = 1.14, P = 1.9 × 10−15) and burden of predicted loss-of-function variants in KLHDC7B (OR = 2.14, P = 5.2 × 10−30). We also observed single-variant and gene-burden associations with 11 genes known to cause Mendelian forms of hearing loss, including an increased risk in heterozygous carriers of mutations in the autosomal recessive hearing loss genes GJB2 (Gly12fs; OR = 1.21, P = 4.2 × 10−11) and SLC26A5 (gene burden; OR = 1.96, P = 2.8 × 10−17). Our results suggest that loss of KLHDC7B function increases risk for hearing loss, and show that Mendelian hearing loss genes contribute to the burden of hearing loss in the adult population, suggesting a shared etiology between common and rare forms of hearing loss. This work illustrates the potential of large-scale exome sequencing to elucidate the genetic architecture of common traits in which risk is modulated by both common and rare variation.


2021 ◽  
Author(s):  
Vincent Michaud ◽  
Eulalie Lasseaux ◽  
David J Green ◽  
Dave T Gerrard ◽  
Claudio Plaisant ◽  
...  

Genetic diseases have been historically segregated into rare Mendelian and common complex conditions. Large-scale studies using genome sequencing are eroding this distinction and are gradually unmasking the underlying complexity of human traits. We studied a cohort of 1,313 individuals with albinism aiming to gain insights into the genetic architecture of rare, autosomal recessive disorders. We investigated the contribution of regulatory and protein-coding variants at the common and rare ends of the allele-frequency spectrum. We focused on TYR, the gene encoding tyrosinase, and found that a promoter variant, TYR: c.-301C>T [rs4547091], modulates the penetrance of a prevalent, disease-associated missense change, TYR: c.1205G>A [rs1126809]. We also found that homozygosity for a haplotype formed by three common, functional variants, TYR: c.[-301C;575C>A;1205G>A], confers a high risk of albinism (OR>77) and is associated with reduced vision in UK Biobank participants. Finally, we report how the combined analysis of rare and common variants increases diagnostic yield and informs genetic counselling in families with albinism.


2021 ◽  
Author(s):  
Bowen Jin ◽  
John A Capra ◽  
Penelope Benchek ◽  
Nicholas R Wheeler ◽  
Adam C Naj ◽  
...  

Over 90% of variants are rare, and 50% of them are singletons in the Alzheimer's Disease Sequencing Project Whole Exome Sequencing (ADSP WES) data. However, either single variant tests or unit-based tests are limited in the statistical power to detect the association between rare variants and phenotypes. To best utilize rare variants and investigate their biological effect, we exam their association with phenotypes in the context of protein. We developed a protein structure-based approach, POKEMON (Protein Optimized Kernel Evaluation of Missense Nucleotides), which evaluates rare missense variants based on their spatial distribution on the protein rather than allele frequency. The hypothesis behind this is that the three-dimensional spatial distribution of variants within a protein structure provides functional context and improves the power of association tests. POKEMON identified four candidate genes from the ADSP WES data, namely two known Alzheimer's disease (AD) genes (TREM2 and SORL) and two novel genes (DUSP18 and CSF1R). For known AD genes, the signal from the spatial cluster is stable even if we exclude known AD risk variants, indicating the presence of additional low frequency risk variants within these genes. DUSP18 has a cluster of variants primarily shared by case subjects around the ligand-binding domain, and this cluster is further validated in a replication dataset with a larger sample size. POKEMON is an open-source tool available at https://github.com/bushlab-genomics/POKEMON.


Circulation ◽  
2013 ◽  
Vol 127 (suppl_12) ◽  
Author(s):  
Belinda K Cornes ◽  
Jennifer Brody ◽  
Alanna C Morrison ◽  
David Siscovick ◽  
James B Meigs ◽  
...  

Introduction: Common variants in the gene encoding insulin receptor substrate 1 ( IRS1 ) and nearby on 2q36.3 have been associated with levels of fasting insulin (FI). We hypothesized that a greater burden of rare variants in these regions is associated with higher FI. Methods: CHARGE-S sequenced (average coverage >60x) the IRS1 and 2q36.6 regions (totaling 185 kb) in 3,539 individuals on the SOLiD platform. FI information among non-diabetics was available in 3 studies: Framingham Heart Study ( N =811), Cardiovascular Heart Study ( N =967) and Atherosclerosis Risk in Communities Study ( N =1761). We analyzed rare variants (MAF < 1%) using a weighted sum test, similar to Madsen-Browning (powerful to detect an association if effects of casual rare variants are in the same direction), and the SKAT test (preferred method if variant effects are in opposite directions). Meta-analyses of weighted rare variants results used the inverse-variance method while SKAT results used a similar approach. For multi-variant tests, the threshold for significance was considered to be α = 0.05. Coding annotation predictions were obtained from the dbNSFP database which includes functional predictions from SIFT, MutationTaster, Polyphen-2, Phylo-P and LRT. Non-coding annotation information (protein binding regions, transcription factor binding sites, DNase hypersensitivity sites, conservation scores) was obtained from ENCODE and ORegAnno databases. From these annotations, we grouped different types of variants together (possible loss of function; possibly regulatory) in order to determine specific variants contributing most to the effect. Results: Sequencing found 4,534 variants in two regions, 86.7% of which were rare and novel, not seen in 1000 genomes or dbSNP. Approximately 20% of variants had annotation information available; of these, 34 variants were possibly damaging. We found suggestive association with FI ( p =0.03) for all rare variants in the meta-analysis of weighted-sum tests at 2q36.3 but not at IRS1 . At IRS1 (but not at 2q36.3), SKAT meta-analysis tests showed evidence for all rare variants associated with FI ( p =0.03). SKAT tests restricted to N =365 possibly damaging variants at IRS1 suggested an association with FI in coding ( p =0.06) and in non-coding ( p =0.02) variants. Conclusion: Large scale deep sequencing in the IRS1 and 2q36.3 regions found very large numbers of new, rare variants. Multi-variant tests suggest that rare variation in these regions influence FI levels, with individuals with more and rarer variants having higher FI. Further investigation is warranted to address why weighted sum and SKAT tests provide different levels of evidence for association in the two regions. Also, conditional analyses will test whether new rare variants at IRS1 or 2q36 explain observed GWAS associations.


Sign in / Sign up

Export Citation Format

Share Document