scholarly journals Gene family information facilitates variant interpretation and identification of disease-associated genes

2017 ◽  
Author(s):  
Dennis Lal ◽  
Patrick May ◽  
Kaitlin E. Samocha ◽  
Jack A. Kosmicki ◽  
Elise B. Robinson ◽  
...  

AbstractDifferentiating risk-conferring from benign missense variants, and therefore optimal calculation of gene-variant burden, represent a major challenge in particular for rare and genetic heterogeneous disorders. While orthologous gene conservation is commonly employed in variant annotation, approximately 80% of known disease-associated genes are paralogs and belong to gene families. It has not been thoroughly investigated how gene family information can be utilized for disease gene discovery and variant interpretation. We developed a paralog conservation score to empirically evaluate whether paralog conserved or nonconserved sites of in-human paralogs are important for protein function. Using this score, we demonstrate that disease-associated missense variants are significantly enriched at paralog conserved sites across all disease groups and disease inheritance models tested. Next, we assessed whether gene family information could assist in discovering novel disease-associated genes. We subsequently developed a gene family de novo enrichment framework that identified 43 exome-wide enriched gene families including 98 de novo variant carrying genes in more than 10k neurodevelopmental disorder patients. 33 gene family enriched genes represent novel candidate genes which are brain expressed and variant constrained in neurodevelopmental disorders.

2019 ◽  
Author(s):  
Eduardo Pérez-Palma ◽  
Patrick May ◽  
Sumaiya Iqbal ◽  
Lisa-Marie Niestroj ◽  
Juanjiangmeng Du ◽  
...  

AbstractMissense variant interpretation is challenging. Essential regions for protein function are conserved among gene family members, and genetic variants within these regions are potentially more likely to confer risk to disease. Here, we generated 2,871 gene family protein sequence alignments involving 9,990 genes and performed missense variant burden analyses to identify novel essential protein regions. We mapped 2,219,811 variants from the general population into these alignments and compared their distribution with 65,034 missense variants from patients. With this gene family approach, we identified 398 regions enriched for patient variants spanning 33,887 amino acids in 1,058 genes. As a comparison, testing the same genes individually we identified less patient variant enriched regions involving only 2,167 amino acids and 180 genes. Next, we selected de novo variants from 6,753 patients with neurodevelopmental disorders and 1,911 unaffected siblings, and observed a 5.56-fold enrichment of patient variants in our identified regions (95% C.I. =2.76-Inf, p-value = 6.66×10−8). Using an independent ClinVar variant set, we found missense variants inside the identified regions are 111-fold more likely to be classified as pathogenic in comparison to benign classification (OR = 111.48, 95% C.I = 68.09-195.58, p-value < 2.2e−16). All patient variant enriched regions identified (PERs) are available online through a user-friendly platform for interactive data mining, visualization and download at http://per.broadinstitute.org. In summary, our gene family burden analysis approach identified novel patient variant enriched regions in protein sequences. This annotation can empower variant interpretation.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Ilaria Mannucci ◽  
Nghi D. P. Dang ◽  
Hannes Huber ◽  
Jaclyn B. Murry ◽  
Jeff Abramson ◽  
...  

Abstract Background We aimed to define the clinical and variant spectrum and to provide novel molecular insights into the DHX30-associated neurodevelopmental disorder. Methods Clinical and genetic data from affected individuals were collected through Facebook-based family support group, GeneMatcher, and our network of collaborators. We investigated the impact of novel missense variants with respect to ATPase and helicase activity, stress granule (SG) formation, global translation, and their effect on embryonic development in zebrafish. SG formation was additionally analyzed in CRISPR/Cas9-mediated DHX30-deficient HEK293T and zebrafish models, along with in vivo behavioral assays. Results We identified 25 previously unreported individuals, ten of whom carry novel variants, two of which are recurrent, and provide evidence of gonadal mosaicism in one family. All 19 individuals harboring heterozygous missense variants within helicase core motifs (HCMs) have global developmental delay, intellectual disability, severe speech impairment, and gait abnormalities. These variants impair the ATPase and helicase activity of DHX30, trigger SG formation, interfere with global translation, and cause developmental defects in a zebrafish model. Notably, 4 individuals harboring heterozygous variants resulting either in haploinsufficiency or truncated proteins presented with a milder clinical course, similar to an individual harboring a de novo mosaic HCM missense variant. Functionally, we established DHX30 as an ATP-dependent RNA helicase and as an evolutionary conserved factor in SG assembly. Based on the clinical course, the variant location, and type we establish two distinct clinical subtypes. DHX30 loss-of-function variants cause a milder phenotype whereas a severe phenotype is caused by HCM missense variants that, in addition to the loss of ATPase and helicase activity, lead to a detrimental gain-of-function with respect to SG formation. Behavioral characterization of dhx30-deficient zebrafish revealed altered sleep-wake activity and social interaction, partially resembling the human phenotype. Conclusions Our study highlights the usefulness of social media to define novel Mendelian disorders and exemplifies how functional analyses accompanied by clinical and genetic findings can define clinically distinct subtypes for ultra-rare disorders. Such approaches require close interdisciplinary collaboration between families/legal representatives of the affected individuals, clinicians, molecular genetics diagnostic laboratories, and research laboratories.


2021 ◽  
pp. jmedgenet-2020-107462
Author(s):  
Natalie B Tan ◽  
Alistair T Pagnamenta ◽  
Matteo P Ferla ◽  
Jonathan Gadian ◽  
Brian HY Chung ◽  
...  

PurposeBinding proteins (G-proteins) mediate signalling pathways involved in diverse cellular functions and comprise Gα and Gβγ units. Human diseases have been reported for all five Gβ proteins. A de novo missense variant in GNB2 was recently reported in one individual with developmental delay/intellectual disability (DD/ID) and dysmorphism. We aim to confirm GNB2 as a neurodevelopmental disease gene, and elucidate the GNB2-associated neurodevelopmental phenotype in a patient cohort.MethodsWe discovered a GNB2 variant in the index case via exome sequencing and sought individuals with GNB2 variants via international data-sharing initiatives. In silico modelling of the variants was assessed, along with multiple lines of evidence in keeping with American College of Medical Genetics and Genomics guidelines for interpretation of sequence variants.ResultsWe identified 12 unrelated individuals with five de novo missense variants in GNB2, four of which are recurrent: p.(Ala73Thr), p.(Gly77Arg), p.(Lys89Glu) and p.(Lys89Thr). All individuals have DD/ID with variable dysmorphism and extraneurologic features. The variants are located at the universally conserved shared interface with the Gα subunit, which modelling suggests weaken this interaction.ConclusionMissense variants in GNB2 cause a congenital neurodevelopmental disorder with variable syndromic features, broadening the spectrum of multisystem phenotypes associated with variants in genes encoding G-proteins.


2021 ◽  
Author(s):  
Konrad Platzer ◽  
Heinrich Sticht ◽  
Caleb Bupp ◽  
Mythily Ganapathi ◽  
Elaine M. Pereira ◽  
...  

We describe four patients with a neurodevelopmental disorder and de novo missense variants in SLC32A1, the gene that encodes the vesicular GABA transporter (VGAT). The main phenotype comprises moderate to severe intellectual disability, early onset epilepsy within the first 18 months of life and a choreatic, dystonic or dyskinetic movement disorder. In silico modeling and functional analyses in cultured neurons reveal that three of these variants, which are located in helices that line the putative GABA transport pathway, result in reduced quantal size, consistent with impaired filling of synaptic vesicles with GABA. The fourth variant, located in the VGAT N-terminus, does not affect quantal size, but increases presynaptic release probability, leading to more severe synaptic depression during high frequency stimulation. Thus, variants in VGAT can impair GABAergic neurotransmission via at least two mechanisms, by affecting synaptic vesicle filling and by altering synaptic short-term plasticity. This work establishes de novo missense variants in SLC32A1 as a novel cause for a neurodevelopmental disorder with epilepsy.


2022 ◽  
Author(s):  
Tinna Reynisdottir ◽  
Kimberley Anderson ◽  
Leandros Boukas ◽  
Hans Bjornsson

Wiedemann-Steiner syndrome (WSS) is a neurodevelopmental disorder caused by de novo variants in KMT2A, which encodes a multi–domain histone methyltransferase. To gain insight into the currently unknown pathogenesis of WSS, we examined the spatial distribution of likely WSS–causing variants across the 15 different domains of KMT2A. Compared to variants in healthy controls, WSS variants exhibit a 64.1–fold overrepresentation within the CXXC domain – which mediates binding to unmethylated CpGs – suggesting a major role for this domain in mediating the phenotype. In contrast, we find no significant overrepresentation within the catalytic SET domain. Corroborating these results, we find that hippocampal neurons from Kmt2a–deficient mice demonstrate disrupted H3K4me1 preferentially at CpG-rich regions, but this has no systematic impact on gene expression. Motivated by these results, we combine accurate prediction of the CXXC domain structure by AlphaFold2 with prior biological knowledge to develop a classification scheme for missense variants in the CXXC domain. Our classifier achieved 96.0% positive and 92.3% negative predictive value on a hold–out test set. This classification performance enabled us to subsequently perform an in silico saturation mutagenesis and classify a total of 445 variants according to their functional effects. Our results yield a novel insight into the mechanistic basis of WSS and provide an example of how AlphaFold2 can contribute to the in silico characterization of variant effects with very high accuracy, establishing a paradigm potentially applicable to many other Mendelian disorders.


2020 ◽  
Vol 12 (3) ◽  
pp. 185-202
Author(s):  
Xia Han ◽  
Jindan Guo ◽  
Erli Pang ◽  
Hongtao Song ◽  
Kui Lin

Abstract How have genes evolved within a well-known genome phylogeny? Many protein-coding genes should have evolved as a whole at the gene level, and some should have evolved partly through fragments at the subgene level. To comprehensively explore such complex homologous relationships and better understand gene family evolution, here, with de novo-identified modules, the subgene units which could consecutively cover proteins within a set of closely related species, we applied a new phylogeny-based approach that considers evolutionary models with partial homology to classify all protein-coding genes in nine Drosophila genomes. Compared with two other popular methods for gene family construction, our approach improved practical gene family classifications with a more reasonable view of homology and provided a much more complete landscape of gene family evolution at the gene and subgene levels. In the case study, we found that most expanded gene families might have evolved mainly through module rearrangements rather than gene duplications and mainly generated single-module genes through partial gene duplication, suggesting that there might be pervasive subgene rearrangement in the evolution of protein-coding gene families. The use of a phylogeny-based approach with partial homology to classify and analyze protein-coding gene families may provide us with a more comprehensive landscape depicting how genes evolve within a well-known genome phylogeny.


2021 ◽  
Vol 13 (594) ◽  
pp. eabc1739
Author(s):  
Amanda Koire ◽  
Panagiotis Katsonis ◽  
Young Won Kim ◽  
Christie Buchovecky ◽  
Stephen J. Wilson ◽  
...  

Genotype-phenotype relationships shape health and population fitness but remain difficult to predict and interpret. Here, we apply an evolutionary action method to de novo missense variants in whole-exome sequences of individuals with autism spectrum disorder (ASD) to unravel genes and pathways connected to ASD. Evolutionary action predicts the impact of missense variants on protein function by measuring the fitness effect based on phylogenetic distances and substitution odds in homologous gene sequences. By examining de novo missense variants in 2384 individuals with ASD (probands) compared to matched siblings without ASD, we found missense variants in 398 genes representing 23 pathways that were biased toward higher evolutionary action scores than expected by random chance; these pathways were involved in axonogenesis, synaptic transmission, and neurodevelopment. The predicted fitness impact of de novo and inherited missense variants in candidate genes correlated with the IQ of individuals with ASD, even for new gene candidates. Taking an evolutionary action method, we detected those missense variants most likely to contribute to ASD pathogenesis and elucidated their phenotypic impact. This approach could be applied to integrate missense variants across a patient cohort to identify genes contributing to a shared phenotype in other complex diseases.


Genetics ◽  
1997 ◽  
Vol 147 (3) ◽  
pp. 1259-1266 ◽  
Author(s):  
Joseph H Nadeau ◽  
David Sankoff

Duplicated genes are an important source of new protein functions and novel developmental and physiological pathways. Whereas most models for fate of duplicated genes show that they tend to be rapidly lost, models for pathway evolution suggest that many duplicated genes rapidly acquire novel functions. Little empirical evidence is available, however, for the relative rates of gene loss vs. divergence to help resolve these contradictory expectations. Gene families resulting from genome duplications provide an opportunity to address this apparent contradiction. With genome duplication, the number of duplicated genes in a gene family is at most 2n, where n is the number of duplications. The size of each gene family, e.g., 1, 2, 3,..., 2n, reflects the patterns of gene loss vs. functional divergence after duplication. We focused on gene families in humans and mice that arose from genome duplications in early vertebrate evolution and we analyzed the frequency distribution of gene family size, i.e., the number of families with two, three or four members. All the models that we evaluated showed that duplicated genes are almost as likely to acquire a new and essential function as to be lost through acquisition of mutations that compromise protein function. An explanation for the unexpectedly high rate of functional divergence is that duplication allows genes to accumulate more neutral than disadvantageous mutations, thereby providing more opportunities to acquire diversified functions and pathways.


Author(s):  
Da Kuang ◽  
Rebecca Truty ◽  
Jochen Weile ◽  
Britt Johnson ◽  
Keith Nykamp ◽  
...  

Abstract Motivation When rare missense variants are clinically interpreted as to their pathogenicity, most are classified as variants of uncertain significance (VUS). Although functional assays can provide strong evidence for variant classification, such results are generally unavailable. Multiplexed assays of variant effect can generate experimental ‘variant effect maps’ that score nearly all possible missense variants in selected protein targets for their impact on protein function. However, these efforts have not always prioritized proteins for which variant effect maps would have the greatest impact on clinical variant interpretation. Results Here, we mined databases of clinically interpreted variants and applied three strategies, each building on the previous, to prioritize genes for systematic functional testing of missense variation. The strategies ranked genes (i) by the number of unique missense VUS that had been reported to ClinVar; (ii) by movability- and reappearance-weighted impact scores, to give extra weight to reappearing, movable VUS and (iii) by difficulty-adjusted impact scores, to account for the more resource-intensive nature of generating variant effect maps for longer genes. Our results could be used to guide systematic functional testing of missense variation toward greater impact on clinical variant interpretation. Availability and implementation Source code available at: https://github.com/rothlab/mave-gene-prioritization Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document