scholarly journals Regional missense constraint improves variant deleteriousness prediction

2017 ◽  
Author(s):  
Kaitlin E. Samocha ◽  
Jack A. Kosmicki ◽  
Konrad J. Karczewski ◽  
Anne H. O’Donnell-Luria ◽  
Emma Pierce-Hoffman ◽  
...  

AbstractGiven increasing numbers of patients who are undergoing exome or genome sequencing, it is critical to establish tools and methods to interpret the impact of genetic variation. While the ability to predict deleteriousness for any given variant is limited, missense variants remain a particularly challenging class of variation to interpret, since they can have drastically different effects depending on both the precise location and specific amino acid substitution of the variant. In order to better evaluate missense variation, we leveraged the exome sequencing data of 60,706 individuals from the Exome Aggregation Consortium (ExAC) dataset to identify sub-genic regions that are depleted of missense variation. We further used this depletion as part of a novel missense deleteriousness metric named MPC. We applied MPC to de novo missense variants and identified a category of de novo missense variants with the same impact on neurodevelopmental disorders as truncating mutations in intolerant genes, supporting the value of incorporating regional missense constraint in variant interpretation.

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Ilaria Mannucci ◽  
Nghi D. P. Dang ◽  
Hannes Huber ◽  
Jaclyn B. Murry ◽  
Jeff Abramson ◽  
...  

Abstract Background We aimed to define the clinical and variant spectrum and to provide novel molecular insights into the DHX30-associated neurodevelopmental disorder. Methods Clinical and genetic data from affected individuals were collected through Facebook-based family support group, GeneMatcher, and our network of collaborators. We investigated the impact of novel missense variants with respect to ATPase and helicase activity, stress granule (SG) formation, global translation, and their effect on embryonic development in zebrafish. SG formation was additionally analyzed in CRISPR/Cas9-mediated DHX30-deficient HEK293T and zebrafish models, along with in vivo behavioral assays. Results We identified 25 previously unreported individuals, ten of whom carry novel variants, two of which are recurrent, and provide evidence of gonadal mosaicism in one family. All 19 individuals harboring heterozygous missense variants within helicase core motifs (HCMs) have global developmental delay, intellectual disability, severe speech impairment, and gait abnormalities. These variants impair the ATPase and helicase activity of DHX30, trigger SG formation, interfere with global translation, and cause developmental defects in a zebrafish model. Notably, 4 individuals harboring heterozygous variants resulting either in haploinsufficiency or truncated proteins presented with a milder clinical course, similar to an individual harboring a de novo mosaic HCM missense variant. Functionally, we established DHX30 as an ATP-dependent RNA helicase and as an evolutionary conserved factor in SG assembly. Based on the clinical course, the variant location, and type we establish two distinct clinical subtypes. DHX30 loss-of-function variants cause a milder phenotype whereas a severe phenotype is caused by HCM missense variants that, in addition to the loss of ATPase and helicase activity, lead to a detrimental gain-of-function with respect to SG formation. Behavioral characterization of dhx30-deficient zebrafish revealed altered sleep-wake activity and social interaction, partially resembling the human phenotype. Conclusions Our study highlights the usefulness of social media to define novel Mendelian disorders and exemplifies how functional analyses accompanied by clinical and genetic findings can define clinically distinct subtypes for ultra-rare disorders. Such approaches require close interdisciplinary collaboration between families/legal representatives of the affected individuals, clinicians, molecular genetics diagnostic laboratories, and research laboratories.


2021 ◽  
Vol 13 (594) ◽  
pp. eabc1739
Author(s):  
Amanda Koire ◽  
Panagiotis Katsonis ◽  
Young Won Kim ◽  
Christie Buchovecky ◽  
Stephen J. Wilson ◽  
...  

Genotype-phenotype relationships shape health and population fitness but remain difficult to predict and interpret. Here, we apply an evolutionary action method to de novo missense variants in whole-exome sequences of individuals with autism spectrum disorder (ASD) to unravel genes and pathways connected to ASD. Evolutionary action predicts the impact of missense variants on protein function by measuring the fitness effect based on phylogenetic distances and substitution odds in homologous gene sequences. By examining de novo missense variants in 2384 individuals with ASD (probands) compared to matched siblings without ASD, we found missense variants in 398 genes representing 23 pathways that were biased toward higher evolutionary action scores than expected by random chance; these pathways were involved in axonogenesis, synaptic transmission, and neurodevelopment. The predicted fitness impact of de novo and inherited missense variants in candidate genes correlated with the IQ of individuals with ASD, even for new gene candidates. Taking an evolutionary action method, we detected those missense variants most likely to contribute to ASD pathogenesis and elucidated their phenotypic impact. This approach could be applied to integrate missense variants across a patient cohort to identify genes contributing to a shared phenotype in other complex diseases.


Author(s):  
Da Kuang ◽  
Rebecca Truty ◽  
Jochen Weile ◽  
Britt Johnson ◽  
Keith Nykamp ◽  
...  

Abstract Motivation When rare missense variants are clinically interpreted as to their pathogenicity, most are classified as variants of uncertain significance (VUS). Although functional assays can provide strong evidence for variant classification, such results are generally unavailable. Multiplexed assays of variant effect can generate experimental ‘variant effect maps’ that score nearly all possible missense variants in selected protein targets for their impact on protein function. However, these efforts have not always prioritized proteins for which variant effect maps would have the greatest impact on clinical variant interpretation. Results Here, we mined databases of clinically interpreted variants and applied three strategies, each building on the previous, to prioritize genes for systematic functional testing of missense variation. The strategies ranked genes (i) by the number of unique missense VUS that had been reported to ClinVar; (ii) by movability- and reappearance-weighted impact scores, to give extra weight to reappearing, movable VUS and (iii) by difficulty-adjusted impact scores, to account for the more resource-intensive nature of generating variant effect maps for longer genes. Our results could be used to guide systematic functional testing of missense variation toward greater impact on clinical variant interpretation. Availability and implementation Source code available at: https://github.com/rothlab/mave-gene-prioritization Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Haicang Zhang ◽  
Michelle S. Xu ◽  
Wendy K. Chung ◽  
Yufeng Shen

AbstractAccurate prediction of damaging missense variants is critically important for interpretating genome sequence. While many methods have been developed, their performance has been limited. Recent progress in machine learning and availability of large-scale population genomic sequencing data provide new opportunities to significantly improve computational predictions. Here we describe gMVP, a new method based on graph attention neural networks. Its main component is a graph with nodes capturing predictive features of amino acids and edges weighted by coevolution strength, which enables effective pooling of information from local protein sequence context and functionally correlated distal positions. Evaluated by deep mutational scan data, gMVP outperforms published methods in identifying damaging variants in TP53, PTEN, BRCA1, and MSH2. Additionally, it achieves the best separation of de novo missense variants in neurodevelopmental disorder cases from the ones in controls. Finally, the model supports transfer learning to optimize gain- and loss-of-function predictions in sodium and calcium channels. In summary, we demonstrate that gMVP can improve interpretation of missense variants in clinical testing and genetic studies.


2022 ◽  
Vol 14 ◽  
Author(s):  
Li Shu ◽  
Neng Xiao ◽  
Jiong Qin ◽  
Qi Tian ◽  
Yanghui Zhang ◽  
...  

Objective: To prove microtubule associated serine/threonine kinase 3 (MAST3) gene is associated with neurodevelopmental diseases (NDD) and the genotype-phenotype correlation.Methods: Trio exome sequencing (trio ES) was performed on four NDD trios. Bioinformatic analysis was conducted based on large-scale genome sequencing data and human brain transcriptomic data. Further in vivo zebrafish studies were performed.Results: In our study, we identified four de novo MAST3 variants (NM_015016.1: c.302C > T:p.Ser101Phe; c.311C > T:p.Ser104Leu; c.1543G > A:p.Gly515Ser; and c.1547T > C:p.Leu516Pro) in four patients with developmental and epileptic encephalopathy (DEE) separately. Clinical heterogeneities were observed in patients carrying variants in domain of unknown function (DUF) and serine-threonine kinase (STK) domain separately. Using the published large-scale exome sequencing data, higher CADD scores of missense variants in DUF domain were found in NDD cohort compared with gnomAD database. In addition, we obtained an excess of missense variants in DUF domain when compared autistic spectrum disorder (ASD) cohort with gnomAD database, similarly an excess of missense variants in STK domain when compared DEE cohort with gnomAD database. Based on Brainspan datasets, we showed that MAST3 expression was significantly upregulated in ASD and DEE-related brain regions and was functionally linked with DEE genes. In zebrafish model, abnormal morphology of central nervous system was observed in mast3a/b crispants.Conclusion: Our results support the possibility that MAST3 is a novel gene associated with NDD which could expand the genetic spectrum for NDD. The genotype-phenotype correlation may contribute to future genetic counseling.


2021 ◽  
Author(s):  
Elke de Boer ◽  
Charlotte W. Ockeloen ◽  
Rosalie A. Kampen ◽  
Juliet E. Hampstead ◽  
Alexander J.M. Dingemans ◽  
...  

Purpose: Although haploinsufficiency of ANKRD11 is among the most common genetic causes of neurodevelopmental disorders, the role of rare ANKRD11 missense variation remains unclear. We characterized the clinical, molecular and functional spectra of ANKRD11 missense variants. Methods: We collected clinical information of individuals with ANKRD11 missense variants and evaluated phenotypic fit to KBG syndrome. We assessed pathogenicity of variants by in silico analyses and cell-based experiments. Results: We identified 29 individuals with (mostly de novo) ANKRD11 missense variants, who presented with syndromic neurodevelopmental disorders and were phenotypically similar to individuals with KBG syndrome caused by ANKRD11 protein truncating variants or 16q24.3 microdeletions. Missense variants significantly clustered in Repression Domain 2. Cellularly, most variants caused reduced ANKRD11 stability. One variant resulted in decreased proteasome degradation and loss of ANKRD11 transcriptional activity. Conclusion: Our study indicates that pathogenic heterozygous missense variants in ANKRD11 cause the clinically recognizable KBG syndrome. Disrupted transrepression capacity and reduced protein stability each independently lead to ANKRD11 loss-of-function, consistent with haploinsufficiency. This highlights the diagnostic relevance of ANKRD11 missense variants, but also poses diagnostic challenges, as the KBG-associated phenotype may be mild and inherited pathogenic ANKRD11 (missense) variants are increasingly observed, warranting stringent variant classification and careful phenotyping.


2019 ◽  
Author(s):  
Eduardo Pérez-Palma ◽  
Patrick May ◽  
Sumaiya Iqbal ◽  
Lisa-Marie Niestroj ◽  
Juanjiangmeng Du ◽  
...  

AbstractMissense variant interpretation is challenging. Essential regions for protein function are conserved among gene family members, and genetic variants within these regions are potentially more likely to confer risk to disease. Here, we generated 2,871 gene family protein sequence alignments involving 9,990 genes and performed missense variant burden analyses to identify novel essential protein regions. We mapped 2,219,811 variants from the general population into these alignments and compared their distribution with 65,034 missense variants from patients. With this gene family approach, we identified 398 regions enriched for patient variants spanning 33,887 amino acids in 1,058 genes. As a comparison, testing the same genes individually we identified less patient variant enriched regions involving only 2,167 amino acids and 180 genes. Next, we selected de novo variants from 6,753 patients with neurodevelopmental disorders and 1,911 unaffected siblings, and observed a 5.56-fold enrichment of patient variants in our identified regions (95% C.I. =2.76-Inf, p-value = 6.66×10−8). Using an independent ClinVar variant set, we found missense variants inside the identified regions are 111-fold more likely to be classified as pathogenic in comparison to benign classification (OR = 111.48, 95% C.I = 68.09-195.58, p-value < 2.2e−16). All patient variant enriched regions identified (PERs) are available online through a user-friendly platform for interactive data mining, visualization and download at http://per.broadinstitute.org. In summary, our gene family burden analysis approach identified novel patient variant enriched regions in protein sequences. This annotation can empower variant interpretation.


2017 ◽  
Author(s):  
Dennis Lal ◽  
Patrick May ◽  
Kaitlin E. Samocha ◽  
Jack A. Kosmicki ◽  
Elise B. Robinson ◽  
...  

AbstractDifferentiating risk-conferring from benign missense variants, and therefore optimal calculation of gene-variant burden, represent a major challenge in particular for rare and genetic heterogeneous disorders. While orthologous gene conservation is commonly employed in variant annotation, approximately 80% of known disease-associated genes are paralogs and belong to gene families. It has not been thoroughly investigated how gene family information can be utilized for disease gene discovery and variant interpretation. We developed a paralog conservation score to empirically evaluate whether paralog conserved or nonconserved sites of in-human paralogs are important for protein function. Using this score, we demonstrate that disease-associated missense variants are significantly enriched at paralog conserved sites across all disease groups and disease inheritance models tested. Next, we assessed whether gene family information could assist in discovering novel disease-associated genes. We subsequently developed a gene family de novo enrichment framework that identified 43 exome-wide enriched gene families including 98 de novo variant carrying genes in more than 10k neurodevelopmental disorder patients. 33 gene family enriched genes represent novel candidate genes which are brain expressed and variant constrained in neurodevelopmental disorders.


2021 ◽  
Author(s):  
Kenneth Douglas Doig ◽  
Christopher G. Love ◽  
Thomas Conway ◽  
Andrei Seleznev ◽  
David Ma ◽  
...  

Abstract Background: Next generation sequencing for oncology patient management is now routine in clinical pathology laboratories. Although wet lab, sequencing and pipeline tasks are largely automated, the analysis of variants for clinical reporting remains largely a manual task. The increasing volume of sequencing data and the limited availability of genetic experts to analyse and report on variants in the data is a key scalability limit for molecular diagnostics.Method: To determine the impact and size of the issue, we examined the longitudinally compiled genetic variants from 48,036 cancer patients over a six year period in a large cancer hospital from ten targeted cancer panel tests in germline, solid tumour and haematology contexts using hybridization capture and amplicon assays. This testing generated 24,168,398 sequenced variants of which 23,255 (8,214 unique) were clinically reported. Results: Of the reported variants, 17,240 (74.1%) were identified in more than one assay which allowed curated variant data to be reused in later reports. The remainder, 6,015 (25.9%) were not subsequently seen in later assays and did not provide any reuse benefit. The number of new variants requiring curation has significantly increased over time from 1.72 to 3.73 variants per sample (292 curated variants per month). Analysis of the 23,255 variants reported, showed 28.6%(n=2,356) were not present in common public variant resources and therefore required de novo curation. These in-house only variants were enriched for indels, tumour suppressor genes and from solid tumour assays.Conclusion: This analysis highlights the significant percentage of variants not present within common public variant resources and the level of non-recurrent variants that consequently require greater curation effort. Many of these variants are unique to a single patient and unlikely to appear in other patients reflecting the personalised nature of cancer genomics. This study depicts the real-world situation for pathology laboratories faced with curating increasing numbers low-recurrence variants while needing to expedite the process of manual variant curation. In the absence of suitably accurate automated methods, new approaches are needed to scale oncology diagnostics for future genetic testing volumes.


Sign in / Sign up

Export Citation Format

Share Document