scholarly journals Exome-by-phenome-wide rare variant gene burden association with electronic health record phenotypes

2019 ◽  
Author(s):  
Joseph Park ◽  
Nathan Katz ◽  
Xinyuan Zhang ◽  
Anastasia M Lucas ◽  
Anurag Verma ◽  
...  

AbstractBackgroundBy coupling large-scale DNA sequencing with electronic health records (EHR), “genome-first” approaches can enhance our understanding of the contribution of rare genetic variants to disease. Aggregating rare, loss-of-function variants in a candidate gene into a “gene burden” to test for association with EHR phenotypes can identify both known and novel clinical implications for the gene in human disease. However, this methodology has not yet been applied on both an exome-wide and phenome-wide scale, and the clinical ontologies of rare loss-of-function variants in many genes have yet to be described.MethodsWe leveraged whole exome sequencing (WES) data in participants (N=11,451) in the Penn Medicine Biobank (PMBB) to address on an exome-wide scale the association of a burden of rare loss-of-function variants in each gene with diverse EHR phenotypes using a phenome-wide association study (PheWAS) approach. For discovery, we collapsed rare (minor allele frequency (MAF) ≤ 0.1%) predicted loss-of-function (pLOF) variants (i.e. frameshift insertions/deletions, gain/loss of stop codon, or splice site disruption) per gene to perform a gene burden PheWAS. Subsequent evaluation of the significant gene burden associations was done by collapsing rare (MAF ≤ 0.1%) missense variants with Rare Exonic Variant Ensemble Learner (REVEL) scores ≥ 0.5 into corresponding yet distinct gene burdens, as well as interrogation of individual low-frequency to common (MAF > 0.1%) pLOF variants and missense variants with REVEL≥ 0.5. We replicated our findings using the UK Biobank’s (UKBB) whole exome sequence dataset (N=49,960).ResultsFrom the pLOF-based discovery phase, we identified 106 gene burdens with phenotype associations at p<10-6 from our exome-by-phenome-wide association studies. Positive-control associations included TTN (cardiomyopathy, p=7.83E-13), MYBPC3 (hypertrophic cardiomyopathy, p=3.48E-15), CFTR (cystic fibrosis, p=1.05E-15), CYP2D6 (adverse effects due to opiates/narcotics, p=1.50E-09), and BRCA2 (breast cancer, p=1.36E-07). Of the 106 genes, 12 gene-phenotype relationships were also detected by REVEL-informed missense-based gene burdens and 19 by single-variant analyses, demonstrating the robustness of these gene-phenotype relationships. Three genes showed evidence of association using both additional methods (BRCA1, CFTR, TGM6), leading to a total of 28 robust gene-phenotype associations within PMBB. Furthermore, replication studies in UKBB validated 30 of 106 gene burden associations, of which 12 demonstrated robustness in PMBB.ConclusionOur study presents 12 exome-by-phenome-wide robust gene-phenotype associations, which include three proof-of-concept associations and nine novel findings. We show the value of aggregating rare pLOF variants into gene burdens on an exome-wide scale for unbiased association with EHR phenotypes to identify novel clinical ontologies of human genes. Furthermore, we show the significance of evaluating gene burden associations through complementary, yet non-overlapping genetic association studies from the same dataset. Our results suggest that this approach applied to even larger cohorts of individuals with WES or whole-genome sequencing data linked to EHR phenotype data will yield many new insights into the relationship of genetic variation and disease phenotypes.

2021 ◽  
Author(s):  
Haicang Zhang ◽  
Michelle S. Xu ◽  
Wendy K. Chung ◽  
Yufeng Shen

AbstractAccurate prediction of damaging missense variants is critically important for interpretating genome sequence. While many methods have been developed, their performance has been limited. Recent progress in machine learning and availability of large-scale population genomic sequencing data provide new opportunities to significantly improve computational predictions. Here we describe gMVP, a new method based on graph attention neural networks. Its main component is a graph with nodes capturing predictive features of amino acids and edges weighted by coevolution strength, which enables effective pooling of information from local protein sequence context and functionally correlated distal positions. Evaluated by deep mutational scan data, gMVP outperforms published methods in identifying damaging variants in TP53, PTEN, BRCA1, and MSH2. Additionally, it achieves the best separation of de novo missense variants in neurodevelopmental disorder cases from the ones in controls. Finally, the model supports transfer learning to optimize gain- and loss-of-function predictions in sodium and calcium channels. In summary, we demonstrate that gMVP can improve interpretation of missense variants in clinical testing and genetic studies.


Genes ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 934
Author(s):  
Donato Gemmati ◽  
Giovanna Longo ◽  
Eugenia Franchini ◽  
Juliana Araujo Silva ◽  
Ines Gallo ◽  
...  

Inherited thrombophilia (e.g., venous thromboembolism, VTE) is due to rare loss-of-function mutations in anticoagulant factors genes (i.e., SERPINC1, PROC, PROS1), common gain-of-function mutations in procoagulant factors genes (i.e., F5, F2), and acquired risk conditions. Genome Wide Association Studies (GWAS) recently recognized several genes associated with VTE though gene defects may unpredictably remain asymptomatic, so calculating the individual genetic predisposition is a challenging task. We investigated a large family with severe, recurrent, early-onset VTE in which two sisters experienced VTE during pregnancies characterized by a perinatal in-utero thrombosis in the newborn and a life-saving pregnancy-interruption because of massive VTE, respectively. A nonsense mutation (CGA > TGA) generating a premature stop-codon (c.1171C>T; p.R391*) in the exon 6 of SERPINC1 gene (1q25.1) causing Antithrombin (AT) deficiency and the common missense mutation (c.1691G>A; p.R506Q) in the exon 10 of F5 gene (1q24.2) (i.e., FV Leiden; rs6025) were coinherited in all the symptomatic members investigated suspecting a cis-segregation further confirmed by STR-linkage-analyses [i.e., SERPINC1 IVS5 (ATT)5–18, F5 IVS2 (AT)6–33 and F5 IVS11 (GT)12–16] and SERPINC1 intragenic variants (i.e., rs5878 and rs677). A multilocus investigation of blood-coagulation balance genes detected the coexistence of FV Leiden (rs6025) in trans with FV HR2-haplotype (p.H1299R; rs1800595) in the aborted fetus, and F11 rs2289252, F12 rs1801020, F13A1 rs5985, and KNG1 rs710446 in the newborn and other members. Common selected gene variants may strongly synergize with less common mutations tuning potential life-threatening conditions when combined with rare severest mutations. Merging classic and newly GWAS-identified gene markers in at risk families is mandatory for VTE risk estimation in the clinical practice, avoiding partial risk score evaluation in unrecognized at risk patients.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 258
Author(s):  
Karim Karimi ◽  
Duy Ngoc Do ◽  
Mehdi Sargolzaei ◽  
Younes Miar

Characterizing the genetic structure and population history can facilitate the development of genomic breeding strategies for the American mink. In this study, we used the whole genome sequences of 100 mink from the Canadian Centre for Fur Animal Research (CCFAR) at the Dalhousie Faculty of Agriculture (Truro, NS, Canada) and Millbank Fur Farm (Rockwood, ON, Canada) to investigate their population structure, genetic diversity and linkage disequilibrium (LD) patterns. Analysis of molecular variance (AMOVA) indicated that the variation among color-types was significant (p < 0.001) and accounted for 18% of the total variation. The admixture analysis revealed that assuming three ancestral populations (K = 3) provided the lowest cross-validation error (0.49). The effective population size (Ne) at five generations ago was estimated to be 99 and 50 for CCFAR and Millbank Fur Farm, respectively. The LD patterns revealed that the average r2 reduced to <0.2 at genomic distances of >20 kb and >100 kb in CCFAR and Millbank Fur Farm suggesting that the density of 120,000 and 24,000 single nucleotide polymorphisms (SNP) would provide the adequate accuracy of genomic evaluation in these populations, respectively. These results indicated that accounting for admixture is critical for designing the SNP panels for genotype-phenotype association studies of American mink.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Karin Wallander ◽  
Jessada Thutkawkorapin ◽  
Ellika Sahlin ◽  
Annika Lindblom ◽  
Kristina Lagerstedt-Robinson

Abstract Background We have previously reported a family with a suspected autosomal dominant rectal and gastric cancer syndrome without any obvious causative genetic variant. Here, we focused the study on a potentially isolated rectal cancer syndrome in this family. Methods We included seven family members (six obligate carriers). Whole-exome sequencing and whole-genome sequencing data were analyzed and filtered for shared coding and splicing sequence and structural variants among the affected individuals. Results When considering family members with rectal cancer or advanced adenomas as affected, we found six new potentially cancer-associated variants in the genes CENPB, ZBTB20, CLINK, LRRC26, TRPM1, and NPEPL1. All variants were missense variants and none of the genes have previously been linked to inherited rectal cancer. No structural variant was found. Conclusion By massive parallel sequencing in a family suspected of carrying a highly penetrant rectal cancer predisposing genetic variant, we found six genetic missense variants with a potential connection to the rectal cancer in this family. One of them could be a high-risk genetic variant, or one or more of them could be low risk variants. The p.(Glu438Lys) variant in the CENPB gene was found to be of particular interest. The CENPB protein binds DNA and helps form centromeres during mitosis. It is involved in the WNT signaling pathway, which is critical for colorectal cancer development and its role in inherited rectal cancer needs to be further examined.


Author(s):  
Elisabeth Bosch ◽  
Moritz Hebebrand ◽  
Bernt Popp ◽  
Theresa Penger ◽  
Bettina Behring ◽  
...  

Abstract Context CPE encodes carboxypeptidase E, an enzyme which converts proneuropeptides and propeptide hormones to bioactive forms. It is widely expressed in the endocrine and central nervous system. To date, four individuals from two families with core clinical features including morbid obesity, neurodevelopmental delay and hypogonadotropic hypogonadism, harbouring biallelic loss-of-function CPE variants, were reported. Objective We describe four affected individuals from three unrelated consanguineous families, two siblings of Syrian, one of Egyptian and one of Pakistani descent, all harbouring novel homozygous CPE loss-of-function variants. Methods After excluding Prader-Willi syndrome, exome sequencing was performed in both Syrian siblings. The variants identified in the other two individuals were reported as research variants in a large scale exome study and in ClinVar database. Computational modelling of all possible missense alterations allowed assessing CPE tolerance to missense variants. Results All affected individuals were severely obese with neurodevelopmental delay and other endocrine anomalies. Three individuals from two families shared the same CPE homozygous truncating variant c.361C&gt;T, p.(Arg121*), while the fourth carried the c.994del, p.(Ser333Alafs*22) variant. Comparison of clinical features with previously described cases and standardization according to the Human Phenotype Ontology indicated a recognisable clinical phenotype, which we termed Blakemore-Durmaz-Vasileiou (BDV) syndrome. Computational analysis indicated high conservation of CPE domains and intolerance to missense changes. Conclusions Biallelic truncating CPE variants are associated with BDV syndrome, a clinically recognisable monogenic recessive syndrome with childhood-onset obesity, neurodevelopmental delay, hypogonadotropic hypogonadism and hypothyroidism. BDV syndrome resembles Prader-Willi syndrome. Our findings suggested that missense variants may also be clinically relevant.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Guilherme B. Neumann ◽  
Paula Korkuć ◽  
Danny Arends ◽  
Manuel J. Wolf ◽  
Katharina May ◽  
...  

Abstract Background German Black Pied cattle (DSN) are an endangered dual-purpose breed which was largely replaced by Holstein cattle due to their lower milk yield. DSN cattle are kept as a genetic reserve with a current herd size of around 2500 animals. The ability to track sequence variants specific to DSN could help to support the conservation of DSN’s genetic diversity and to provide avenues for genetic improvement. Results Whole-genome sequencing data of 304 DSN cattle were used to design a customized DSN200k SNP chip harboring 182,154 variants (173,569 SNPs and 8585 indels) based on ten selection categories. We included variants of interest to DSN such as DSN unique variants and variants from previous association studies in DSN, but also variants of general interest such as variants with predicted consequences of high, moderate, or low impact on the transcripts and SNPs from the Illumina BovineSNP50 BeadChip. Further, the selection of variants based on haplotype blocks ensured that the whole-genome was uniformly covered with an average variant distance of 14.4 kb on autosomes. Using 300 DSN and 162 animals from other cattle breeds including Holstein, endangered local cattle populations, and also a Bos indicus breed, performance of the SNP chip was evaluated. Altogether, 171,978 (94.31%) of the variants were successfully called in at least one of the analyzed breeds. In DSN, the number of successfully called variants was 166,563 (91.44%) while 156,684 (86.02%) were segregating at a minor allele frequency > 1%. The concordance rate between technical replicates was 99.83 ± 0.19%. Conclusion The DSN200k SNP chip was proved useful for DSN and other Bos taurus as well as one Bos indicus breed. It is suitable for genetic diversity management and marker-assisted selection of DSN animals. Moreover, variants that were segregating in other breeds can be used for the design of breed-specific customized SNP chips. This will be of great value in the application of conservation programs for endangered local populations in the future.


2018 ◽  
Author(s):  
Paul C. Marcogliese ◽  
Vandana Shashi ◽  
Rebecca C. Spillmann ◽  
Nicholas Stong ◽  
Jill A. Rosenfeld ◽  
...  

AbstractThe Interferon Regulatory Factor 2 Binding Protein Like (IRF2BPL) gene encodes a member of the IRF2BP family of transcriptional regulators. Currently the biological function of this gene is obscure, and the gene has not been associated with a Mendelian disease. Here we describe seven individuals affected with neurological symptoms who carry damaging heterozygous variants in IRF2BPL. Five cases carrying nonsense variants in IRF2BPL resulting in a premature stop codon display severe neurodevelopmental regression, hypotonia, progressive ataxia, seizures, and a lack of coordination. Two additional individuals, both with missense variants, display global developmental delay and seizures and a relatively milder phenotype than those with nonsense alleles. The bioinformatics signature for IRF2BPL based on population genomics is consistent with a gene that is intolerant to variation. We show that the IRF2BPL ortholog in the fruit fly, called pits (protein interacting with Ttk69 and Sin3A), is broadly expressed including the nervous system. Complete loss of pits is lethal early in development, whereas partial knock-down with RNA interference in neurons leads to neurodegeneration, revealing requirement for this gene in proper neuronal function and maintenance. The nonsense variants in IRF2BPL identified in patients behave as severe loss-of-function alleles in this model organism, while ectopic expression of the missense variants leads to a range of phenotypes. Taken together, IRF2BPL and pits are required in the nervous system in humans and flies, and their loss leads to a range of neurological phenotypes in both species.


Cells ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 2500
Author(s):  
Marta Garcia-Forn ◽  
Andrea Boitnott ◽  
Zeynep Akpinar ◽  
Silvia De Rubeis

Autism spectrum disorder (ASD) is a prevalent neurodevelopmental disorder characterized by impairments in social communication and social interaction, and the presence of repetitive behaviors and/or restricted interests. In the past few years, large-scale whole-exome sequencing and genome-wide association studies have made enormous progress in our understanding of the genetic risk architecture of ASD. While showing a complex and heterogeneous landscape, these studies have led to the identification of genetic loci associated with ASD risk. The intersection of genetic and transcriptomic analyses have also begun to shed light on functional convergences between risk genes, with the mid-fetal development of the cerebral cortex emerging as a critical nexus for ASD. In this review, we provide a concise summary of the latest genetic discoveries on ASD. We then discuss the studies in postmortem tissues, stem cell models, and rodent models that implicate recently identified ASD risk genes in cortical development.


2019 ◽  
Vol 20 (17) ◽  
pp. 1189-1197 ◽  
Author(s):  
Vincent Gagné ◽  
Anne Aubry-Morin ◽  
Maria Plesa ◽  
Rachid Abaji ◽  
Kateryna Petrykey ◽  
...  

Aim: To evaluate top-ranking genes identified through genome-wide association studies for an association with corticosteroid-related osteonecrosis in children with acute lymphoblastic leukemia (ALL) who received Dana–Farber Cancer Institute treatment protocols. Patients & methods: Lead SNPs from these studies, as well as other variants in the same genes, pooled from whole exome sequencing data, were analyzed for an association with osteonecrosis in childhood ALL patients from Quebec cohort. Top-ranking variants were verified in the replication patient group. Results: The analyses of variants in the ACP1-SH3YL1 locus derived from whole exome sequencing data showed an association of several correlated SNPs (rs11553746, rs2290911, rs7595075, rs2306060 and rs79716074). The rs79716074 defines *B haplotype of the APC1 gene, which is well known for its functional role. Conclusion: This study confirms implication of the ACP1 gene in the treatment-related osteonecrosis in childhood ALL and identifies novel, potentially causal variant of this complication.


Sign in / Sign up

Export Citation Format

Share Document