scholarly journals Identification of pathogenic variant enriched regions across genes and gene families

2019 ◽  
Author(s):  
Eduardo Pérez-Palma ◽  
Patrick May ◽  
Sumaiya Iqbal ◽  
Lisa-Marie Niestroj ◽  
Juanjiangmeng Du ◽  
...  

AbstractMissense variant interpretation is challenging. Essential regions for protein function are conserved among gene family members, and genetic variants within these regions are potentially more likely to confer risk to disease. Here, we generated 2,871 gene family protein sequence alignments involving 9,990 genes and performed missense variant burden analyses to identify novel essential protein regions. We mapped 2,219,811 variants from the general population into these alignments and compared their distribution with 65,034 missense variants from patients. With this gene family approach, we identified 398 regions enriched for patient variants spanning 33,887 amino acids in 1,058 genes. As a comparison, testing the same genes individually we identified less patient variant enriched regions involving only 2,167 amino acids and 180 genes. Next, we selected de novo variants from 6,753 patients with neurodevelopmental disorders and 1,911 unaffected siblings, and observed a 5.56-fold enrichment of patient variants in our identified regions (95% C.I. =2.76-Inf, p-value = 6.66×10−8). Using an independent ClinVar variant set, we found missense variants inside the identified regions are 111-fold more likely to be classified as pathogenic in comparison to benign classification (OR = 111.48, 95% C.I = 68.09-195.58, p-value < 2.2e−16). All patient variant enriched regions identified (PERs) are available online through a user-friendly platform for interactive data mining, visualization and download at http://per.broadinstitute.org. In summary, our gene family burden analysis approach identified novel patient variant enriched regions in protein sequences. This annotation can empower variant interpretation.

2017 ◽  
Author(s):  
Dennis Lal ◽  
Patrick May ◽  
Kaitlin E. Samocha ◽  
Jack A. Kosmicki ◽  
Elise B. Robinson ◽  
...  

AbstractDifferentiating risk-conferring from benign missense variants, and therefore optimal calculation of gene-variant burden, represent a major challenge in particular for rare and genetic heterogeneous disorders. While orthologous gene conservation is commonly employed in variant annotation, approximately 80% of known disease-associated genes are paralogs and belong to gene families. It has not been thoroughly investigated how gene family information can be utilized for disease gene discovery and variant interpretation. We developed a paralog conservation score to empirically evaluate whether paralog conserved or nonconserved sites of in-human paralogs are important for protein function. Using this score, we demonstrate that disease-associated missense variants are significantly enriched at paralog conserved sites across all disease groups and disease inheritance models tested. Next, we assessed whether gene family information could assist in discovering novel disease-associated genes. We subsequently developed a gene family de novo enrichment framework that identified 43 exome-wide enriched gene families including 98 de novo variant carrying genes in more than 10k neurodevelopmental disorder patients. 33 gene family enriched genes represent novel candidate genes which are brain expressed and variant constrained in neurodevelopmental disorders.


2021 ◽  
pp. jmedgenet-2020-107462
Author(s):  
Natalie B Tan ◽  
Alistair T Pagnamenta ◽  
Matteo P Ferla ◽  
Jonathan Gadian ◽  
Brian HY Chung ◽  
...  

PurposeBinding proteins (G-proteins) mediate signalling pathways involved in diverse cellular functions and comprise Gα and Gβγ units. Human diseases have been reported for all five Gβ proteins. A de novo missense variant in GNB2 was recently reported in one individual with developmental delay/intellectual disability (DD/ID) and dysmorphism. We aim to confirm GNB2 as a neurodevelopmental disease gene, and elucidate the GNB2-associated neurodevelopmental phenotype in a patient cohort.MethodsWe discovered a GNB2 variant in the index case via exome sequencing and sought individuals with GNB2 variants via international data-sharing initiatives. In silico modelling of the variants was assessed, along with multiple lines of evidence in keeping with American College of Medical Genetics and Genomics guidelines for interpretation of sequence variants.ResultsWe identified 12 unrelated individuals with five de novo missense variants in GNB2, four of which are recurrent: p.(Ala73Thr), p.(Gly77Arg), p.(Lys89Glu) and p.(Lys89Thr). All individuals have DD/ID with variable dysmorphism and extraneurologic features. The variants are located at the universally conserved shared interface with the Gα subunit, which modelling suggests weaken this interaction.ConclusionMissense variants in GNB2 cause a congenital neurodevelopmental disorder with variable syndromic features, broadening the spectrum of multisystem phenotypes associated with variants in genes encoding G-proteins.


Blood ◽  
2014 ◽  
Vol 124 (21) ◽  
pp. 4151-4151
Author(s):  
Claudia Lorena Buitrago ◽  
Augusto Rendon ◽  
Ernest Turro ◽  
Yupu Liang ◽  
Ilenia Simeoni ◽  
...  

Abstract # Authors contributed equally to this work. ~ Currently at Genomics England Ltd, London, United Kingdom Next generation sequencing is transforming our understanding of human genetic variation and is becoming a routine part of human genetic analysis. The identification of millions of new variants, which are mainly rare and assessing their implications for human health presents new challenges to researchers and clinicians. We have analyzed missense variants in the ITGB2A and ITGB3 genes obtained from whole exome and whole genome sequencing (WES & WGS) data from 5 databases: The Human Genome Mutation Database, the 1000 Genomes project, the UK10K Whole Exome Sequencing project, the UK10K Whole Genome Sequencing project, and The National Heart, Lung and Blood Institute Exome Sequencing Project. Together, these encompass variants of the platelet αIIbβ3 integrin receptor from ~32,000 alleles derived from 16,108 individuals. We identified 111 missense variants that have previously been associated with Glanzmann thrombasthenia (GT), 20 variants associated with alloimmune thrombocytopenia, and 5 variants associated with aniso/macrothrombocytopenia. None of the GT variants were found in the last four databases, indicating that they have minor allele frequencies (MAF) less than ~0.01%, attesting to both their rarity and the likelihood that they entered the population within the last ~2,500 years. We also identified 114 novel missense variants in ITGB2A affecting ~11% of the amino acids and 68 novel missense variants in ITGB3 affecting ~9% of the amino acids. 96% of the novel variants had MAF <0.1%, indicating their rarity. Based on sequence conservation, MAF, and/or location of the substituted residue on a complete model of αIIbβ3 that suggested a possible effect on protein folding, we selected three novel variants (αIIb P943A and P176H, and β3 C547G) that affect amino acids previously associated with GT for expression in HEK 293 cells. Both αIIb P176H and β3 C547G severely affected αIIbβ3 expression, whereas αIIb P943A had only a partial effect on expression and no effect on DTT-induced fibrinogen binding. We were not surprised that the latter variant did not have a severe effect on expression or function because it has an MAF (0.46%) that is much higher than the MAFs of the other GT-causing variants. To estimate the percentage of the 114 novel identified variants that are likely to be deleterious we used 3 different algorithms, CADD, Polyphen 2-HDVI, and SIFT. The algorithms showed moderate concordance in their rankings of the likelihood that a variant is deleterious. To compare their predictive powers, we performed receiver operating characteristic (ROC) analysis based on their ability to discriminate confirmed GT missense variants (positive controls) from alloantigens (negative controls); the area under the curve (AUC) values were 0.91, 0.88, and 0.90, respectively. At cutoff values that achieved greater than 95% sensitivity for each algorithm: 1) the specificity values were 75%, 65%, and 60%, and 2) the percentages of novel αIIb+β3 missense variants predicted to be deleterious were 43%, 56%, and 58%. Polyphen 2-HDVI and SIFT identified αIIb P176H and β3 C547G as highly likely to be deleterious and αIIb P943A as much less likely to be deleterious, whereas CADD did not differentiate them in the same way. We conclude that ~1.1% of individuals in the populations studied carry at least one missense variant in αIIb or β3 and that 0.6% carry a variant that might be deleterious and therefore may result in a hemorrhagic GT-like phenotype. The rarity of almost all of the novel missense variants identified indicates that they entered the population recently. Despite having detailed knowledge of the structure and function of αIIbβ3, it is difficult to predict with certainty the impact of any single missense variant. This will pose serious challenges as more individuals undergo WES and WGS; we anticipate that linkage to health record data, as will happen for the UK 100,000 Genomes project, will aid clinical interpretation. Finally, “hypomorphic” gene variants that produce only a partial decrease in expression, such as αIIb P943A, may contribute to the wide variation in αIIbβ3 surface expression observed in the healthy population. Disclosures No relevant conflicts of interest to declare.


2021 ◽  
Author(s):  
Sathiya N. Manivannan ◽  
Jolien Roovers ◽  
Noor Smal ◽  
Candace T. Myers ◽  
Dilsad Turkdogan ◽  
...  

FZR1, which encodes the Cdh1 subunit of the Anaphase Promoting Complex, plays an important role in neurodevelopment, both through the control of the cell cycle and through its multiple functions in post-mitotic neurons. In this study, the evaluation of 250 unrelated patients with developmental epileptic encephalopathies (DEE) and a connection on GeneMatcher led to the identification of three de novo missense variants in FZR1. Two variants led to the same amino acid change. All individuals had a DEE with childhood-onset generalized epilepsy, intellectual disability, mild ataxia, and normal head circumference. Two individuals were diagnosed with the DEE subtype Myoclonic Atonic Epilepsy (MAE). We provide gene burden testing using two independent statistical tests to support FZR1 association with DEE. Further, we provide functional evidence that the missense variants are loss-of-function (LOF) alleles using Drosophila neurodevelopment assays. Using three fly mutant alleles of the Drosophila homolog fzr and overexpression studies, we show that patient variants do not support proper neurodevelopment. Along with a recent report of a patient with neonatal-onset DEE with microcephaly who also carries a de novo FZR1 missense variant, our study consolidates the relationship between FZR1 and DEE, and expands the associated phenotype. We conclude that heterozygous LOF of FZR1 leads to DEE associated with a spectrum of neonatal to childhood-onset seizure types, developmental delay, and mild ataxia. Microcephaly can be present but is not an essential feature of FZR1-encephalopathy. In summary, our approach of targeted sequencing using novel gene candidates and functional testing in Drosophila will help solve undiagnosed MAE/DEE cases.


2020 ◽  
Vol 12 (3) ◽  
pp. 185-202
Author(s):  
Xia Han ◽  
Jindan Guo ◽  
Erli Pang ◽  
Hongtao Song ◽  
Kui Lin

Abstract How have genes evolved within a well-known genome phylogeny? Many protein-coding genes should have evolved as a whole at the gene level, and some should have evolved partly through fragments at the subgene level. To comprehensively explore such complex homologous relationships and better understand gene family evolution, here, with de novo-identified modules, the subgene units which could consecutively cover proteins within a set of closely related species, we applied a new phylogeny-based approach that considers evolutionary models with partial homology to classify all protein-coding genes in nine Drosophila genomes. Compared with two other popular methods for gene family construction, our approach improved practical gene family classifications with a more reasonable view of homology and provided a much more complete landscape of gene family evolution at the gene and subgene levels. In the case study, we found that most expanded gene families might have evolved mainly through module rearrangements rather than gene duplications and mainly generated single-module genes through partial gene duplication, suggesting that there might be pervasive subgene rearrangement in the evolution of protein-coding gene families. The use of a phylogeny-based approach with partial homology to classify and analyze protein-coding gene families may provide us with a more comprehensive landscape depicting how genes evolve within a well-known genome phylogeny.


2017 ◽  
Author(s):  
Amanda Koire ◽  
Christie Buchovecky ◽  
Panagiotis Katsonis ◽  
Young Won Kim ◽  
Stephen J. Wilson ◽  
...  

AbstractThe pathogenicity of individual de novo missense mutations in autism spectrum disorder remains difficult to validate. Here we asked in 2,384 probands whether these variants exhibited collective functional impact biases across pathways. As measured with Evolutionary Action (EA) in 368 gene groupings, we found significant biases in axonogenesis, synaptic transmission, and other neurodevelopmental pathways. Strikingly, both de novo and inherited missense variants in prioritized genes correlated with patient IQ. This general integrative approach thus detects missense variants most likely to contribute to autism pathogenesis and is the first, to our knowledge, to link missense variant impact to autism phenotypic severity.


2017 ◽  
Author(s):  
Kaitlin E. Samocha ◽  
Jack A. Kosmicki ◽  
Konrad J. Karczewski ◽  
Anne H. O’Donnell-Luria ◽  
Emma Pierce-Hoffman ◽  
...  

AbstractGiven increasing numbers of patients who are undergoing exome or genome sequencing, it is critical to establish tools and methods to interpret the impact of genetic variation. While the ability to predict deleteriousness for any given variant is limited, missense variants remain a particularly challenging class of variation to interpret, since they can have drastically different effects depending on both the precise location and specific amino acid substitution of the variant. In order to better evaluate missense variation, we leveraged the exome sequencing data of 60,706 individuals from the Exome Aggregation Consortium (ExAC) dataset to identify sub-genic regions that are depleted of missense variation. We further used this depletion as part of a novel missense deleteriousness metric named MPC. We applied MPC to de novo missense variants and identified a category of de novo missense variants with the same impact on neurodevelopmental disorders as truncating mutations in intolerant genes, supporting the value of incorporating regional missense constraint in variant interpretation.


2018 ◽  
Vol 35 (13) ◽  
pp. 2199-2207 ◽  
Author(s):  
Carine Rey ◽  
Philippe Veber ◽  
Bastien Boussau ◽  
Marie Sémon

Abstract Motivation RNA sequencing (RNA-Seq) is a widely used approach to obtain transcript sequences in non-model organisms, notably for performing comparative analyses. However, current bioinformatic pipelines do not take full advantage of pre-existing reference data in related species for improving RNA-Seq assembly, annotation and gene family reconstruction. Results We built an automated pipeline named CAARS to combine novel data from RNA-Seq experiments with existing multi-species gene family alignments. RNA-Seq reads are assembled into transcripts by both de novo and assisted assemblies. Then, CAARS incorporates transcripts into gene families, builds gene alignments and trees and uses phylogenetic information to classify the genes as orthologs and paralogs of existing genes. We used CAARS to assemble and annotate RNA-Seq data in rodents and fishes using distantly related genomes as reference, a difficult case for this kind of analysis. We showed CAARS assemblies are more complete and accurate than those assembled by a standard pipeline consisting of de novo assembly coupled with annotation by sequence similarity on a guide species. In addition to annotated transcripts, CAARS provides gene family alignments and trees, annotated with orthology relationships, directly usable for downstream comparative analyses. Availability and implementation CAARS is implemented in Python and Ocaml and is freely available at https://github.com/carinerey/caars. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Lot Snijders Blok ◽  
Arianna Vino ◽  
Joery den Hoed ◽  
Hunter R. Underhill ◽  
Danielle Monteil ◽  
...  

Abstract Purpose Heterozygous pathogenic variants in various FOXP genes cause specific developmental disorders. The phenotype associated with heterozygous variants in FOXP4 has not been previously described. Methods We assembled a cohort of eight individuals with heterozygous and mostly de novo variants in FOXP4: seven individuals with six different missense variants and one individual with a frameshift variant. We collected clinical data to delineate the phenotypic spectrum, and used in silico analyses and functional cell-based assays to assess pathogenicity of the variants. Results We collected clinical data for six individuals: five individuals with a missense variant in the forkhead box DNA-binding domain of FOXP4, and one individual with a truncating variant. Overlapping features included speech and language delays, growth abnormalities, congenital diaphragmatic hernia, cervical spine abnormalities, and ptosis. Luciferase assays showed loss-of-function effects for all these variants, and aberrant subcellular localization patterns were seen in a subset. The remaining two missense variants were located outside the functional domains of FOXP4, and showed transcriptional repressor capacities and localization patterns similar to the wild-type protein. Conclusion Collectively, our findings show that heterozygous loss-of-function variants in FOXP4 are associated with an autosomal dominant neurodevelopmental disorder with speech/language delays, growth defects, and variable congenital abnormalities.


2020 ◽  
Author(s):  
Ilaria Mannucci ◽  
Nan Cher Yeo ◽  
Hannes Huber ◽  
Jaclyn Murry ◽  
Jeff Abramson ◽  
...  

Background We aimed to define the clinical and mutational spectrum, and to provide novel molecular insights into DHX30-associated neurodevelopmental disorder. Methods Clinical and genetic data from affected individuals were collected through family support group, GeneMatcher and our network of collaborators. Novel missense variants were investigated by in-vitro and in-vivo assays. These analyses included investigation of stress granule formation, global translation, ATPase and helicase activity, as well as the effect of selected variants on embryonal development in Zebrafish. Results We identified altogether 25 previously unreported individuals. All 19 individuals harboring heterozygous missense variants within helicase core motifs (HCMs) have global developmental delay, intellectual disability, severe speech impairment and gait abnormalities. These variants impair the ATPase and helicase activity of DHX30 and global translation, trigger stress granule formation, and cause developmental defects in a zebrafish model. Notably, 4 individuals harboring heterozygous variants resulting either in haploinsufficiency or truncated proteins presented a milder clinical course, similar to an individual bearing a de novo mosaic missense variant within HCM. Late-onset severe ataxia was observed in an individual with a de novo missense variant within the ratchet-like domain, and early-onset lethal epileptic encephalopathy in an individual with a homozygous missense variant within the helicase core region but not within a HCM. We report ten novel variants, two of which are recurrent, and provide evidence of gonadal mosaicism in one family. Functional analyses confirmed pathogenicity of all missense variants, and suggest the existence of clinically distinct subtypes that correlate with their location and nature. Moreover, we established here DHX30 as an ATP-dependent RNA helicase. Conclusions Our study highlights the usefulness of social media in order to define novel Mendelian disorders, and exemplifies how functional analyses accompanied by clinical and genetic findings can define clinically distinct subtypes for ultra-rare disorders. Such approaches require close interdisciplinary collaboration between families/legal representatives of the affected, clinicians, molecular genetics diagnostic laboratories and research laboratories.


Sign in / Sign up

Export Citation Format

Share Document