scholarly journals Robust Classification of Protein Variation Using Structural Modeling and Large-Scale Data Integration

2015 ◽  
Author(s):  
Evan H. Baugh ◽  
Riley Simmons-Edler ◽  
Christian L. Mueller ◽  
Rebecca F. Alford ◽  
Natalia Volfovsky ◽  
...  

Existing methods for interpreting protein variation focus on annotating mutation pathogenicity rather than detailed interpretation of variant deleteriousness and frequently use only sequence-based or structure-based information. We present VIPUR, a computational framework that seamlessly integrates sequence analysis and structural modeling (using the Rosetta protein modeling suite) to identify and interpret deleterious protein variants. To train VIPUR, we collected 9,477 protein variants with known effects on protein function from multiple organisms and curated structural models for each variant from crystal structures and homology models. VIPUR can be applied to mutations in any organism's proteome with improved generalized accuracy (AUROC .83) and interpretability (AUPR .87) compared to other methods. We demonstrate that VIPUR's predictions of deleteriousness match the biological phenotypes in ClinVar and provide a clear ranking of prediction confidence. We use VIPUR to interpret known mutations associated with inflammation and diabetes, demonstrating the structural diversity of disrupted functional sites and improved interpretation of mutations associated with human diseases. Lastly we demonstrate VIPUR's ability to highlight candidate genes associated with human diseases by applying VIPUR to de novo variants associated with autism spectrum disorders.

2018 ◽  
Author(s):  
Yanhui Hu ◽  
Richelle Sopko ◽  
Verena Chung ◽  
Romain A. Studer ◽  
Sean D. Landry ◽  
...  

AbstractPost-translational modification (PTM) serves as a regulatory mechanism for protein function, influencing stability, protein interactions, activity and localization, and is critical in many signaling pathways. The best characterized PTM is phosphorylation, whereby a phosphate is added to an acceptor residue, commonly serine, threonine and tyrosine. As proteins are often phosphorylated at multiple sites, identifying those sites that are important for function is a challenging problem. Considering that many phosphorylation sites may be non-functional, prioritizing evolutionarily conserved phosphosites provides a general strategy to identify the putative functional sites with regards to regulation and function. To facilitate the identification of conserved phosphosites, we generated a large-scale phosphoproteomics dataset from Drosophila embryos collected from six closely-related species. We built iProteinDB (https://www.flyrnai.org/tools/iproteindb/), a resource integrating these data with other high-throughput PTM datasets, including vertebrates, and manually curated information for Drosophila. At iProteinDB, scientists can view the PTM landscape for any Drosophila protein and identify predicted functional phosphosites based on a comparative analysis of data from closely-related Drosophila species. Further, iProteinDB enables comparison of PTM data from Drosophila to that of orthologous proteins from other model organisms, including human, mouse, rat, Xenopus laevis, Danio rerio, and Caenorhabditis elegans.


2016 ◽  
Vol 113 (52) ◽  
pp. 15054-15059 ◽  
Author(s):  
Xiao Ji ◽  
Rachel L. Kember ◽  
Christopher D. Brown ◽  
Maja Bućan

Autism spectrum disorder (ASD) is a heterogeneous, highly heritable neurodevelopmental syndrome characterized by impaired social interaction, communication, and repetitive behavior. It is estimated that hundreds of genes contribute to ASD. We asked if genes with a strong effect on survival and fitness contribute to ASD risk. Human orthologs of genes with an essential role in pre- and postnatal development in the mouse [essential genes (EGs)] are enriched for disease genes and under strong purifying selection relative to human orthologs of mouse genes with a known nonlethal phenotype [nonessential genes (NEGs)]. This intolerance to deleterious mutations, commonly observed haploinsufficiency, and the importance of EGs in development suggest a possible cumulative effect of deleterious variants in EGs on complex neurodevelopmental disorders. With a comprehensive catalog of 3,915 mammalian EGs, we provide compelling evidence for a stronger contribution of EGs to ASD risk compared with NEGs. By examining the exonic de novo and inherited variants from 1,781 ASD quartet families, we show a significantly higher burden of damaging mutations in EGs in ASD probands compared with their non-ASD siblings. The analysis of EGs in the developing brain identified clusters of coexpressed EGs implicated in ASD. Finally, we suggest a high-priority list of 29 EGs with potential ASD risk as targets for future functional and behavioral studies. Overall, we show that large-scale studies of gene function in model organisms provide a powerful approach for prioritization of genes and pathogenic variants identified by sequencing studies of human disease.


GigaScience ◽  
2020 ◽  
Vol 9 (7) ◽  
Author(s):  
Morteza Roodgar ◽  
Afshin Babveyh ◽  
Lan H Nguyen ◽  
Wenyu Zhou ◽  
Rahul Sinha ◽  
...  

Abstract Background Macaque species share >93% genome homology with humans and develop many disease phenotypes similar to those of humans, making them valuable animal models for the study of human diseases (e.g., HIV and neurodegenerative diseases). However, the quality of genome assembly and annotation for several macaque species lags behind the human genome effort. Results To close this gap and enhance functional genomics approaches, we used a combination of de novo linked-read assembly and scaffolding using proximity ligation assay (HiC) to assemble the pig-tailed macaque (Macaca nemestrina) genome. This combinatorial method yielded large scaffolds at chromosome level with a scaffold N50 of 127.5 Mb; the 23 largest scaffolds covered 90% of the entire genome. This assembly revealed large-scale rearrangements between pig-tailed macaque chromosomes 7, 12, and 13 and human chromosomes 2, 14, and 15. We subsequently annotated the genome using transcriptome and proteomics data from personalized induced pluripotent stem cells derived from the same animal. Reconstruction of the evolutionary tree using whole-genome annotation and orthologous comparisons among 3 macaque species, human, and mouse genomes revealed extensive homology between human and pig-tailed macaques with regards to both pluripotent stem cell genes and innate immune gene pathways. Our results confirm that rhesus and cynomolgus macaques exhibit a closer evolutionary distance to each other than either species exhibits to humans or pig-tailed macaques. Conclusions These findings demonstrate that pig-tailed macaques can serve as an excellent animal model for the study of many human diseases particularly with regards to pluripotency and innate immune pathways.


2018 ◽  
Author(s):  
Eric Deneault ◽  
Muhammad Faheem ◽  
Sean H. White ◽  
Deivid C. Rodrigues ◽  
Song Sun ◽  
...  

AbstractInduced pluripotent stem cell (iPSC)-derived cortical neurons are increasingly used as a model to study developmental aspects of Autism Spectrum Disorder (ASD), which is clinically and genetically heterogeneous. To study the complex relationship of rare (penetrant) variant(s) and common (weaker) polygenic risk variant(s) to ASD, “isogenic” iPSC-derived neurons from probands and family-based controls, for modeling, is critical. We developed a standardized set of procedures, designed to control for heterogeneity in reprogramming and differentiation, and generated 53 different iPSC-derived glutamatergic neuronal lines from 25 participants from 12 unrelated families with ASD (14 ASD-affected individuals, 3 unaffected siblings, 8 unaffected parents). Heterozygousde novo(7 families; 16p11.2,NRXN1,DLGAP2,CAPRIN1,VIP,ANOS1,THRA) and rare-inherited (2 families;CNTN5,AGBL4) presumed-damaging variants were characterized in ASD risk genes/loci. In three additional families, functional candidates for ASD (SET), and combinations of putative etiologic variants (GLI3/KIF21AandEHMT2/UBE2Icombinations in separate families), were modeled. We used a large-scale multi-electrode array (MEA) as our primary high-throughput phenotyping assay, followed by patch clamp recordings. Our most compelling new results revealed a consistent spontaneous network hyperactivity in neurons deficient forCNTN5orEHMT2.Our biobank of iPSC-derived neurons and accompanying genomic data are available to accelerate ASD research.


2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Nasna Nassir ◽  
Asma Bankapur ◽  
Bisan Samara ◽  
Abdulrahman Ali ◽  
Awab Ahmed ◽  
...  

Abstract Background In recent years, several hundred autism spectrum disorder (ASD) implicated genes have been discovered impacting a wide range of molecular pathways. However, the molecular underpinning of ASD, particularly from the point of view of ‘brain to behaviour’ pathogenic mechanisms, remains largely unknown. Methods We undertook a study to investigate patterns of spatiotemporal and cell type expression of ASD-implicated genes by integrating large-scale brain single-cell transcriptomes (> million cells) and de novo loss-of-function (LOF) ASD variants (impacting 852 genes from 40,122 cases). Results We identified multiple single-cell clusters from three distinct developmental human brain regions (anterior cingulate cortex, middle temporal gyrus and primary visual cortex) that evidenced high evolutionary constraint through enrichment for brain critical exons and high pLI genes. These clusters also showed significant enrichment with ASD loss-of-function variant genes (p < 5.23 × 10–11) that are transcriptionally highly active in prenatal brain regions (visual cortex and dorsolateral prefrontal cortex). Mapping ASD de novo LOF variant genes into large-scale human and mouse brain single-cell transcriptome analysis demonstrate enrichment of such genes into neuronal subtypes and are also enriched for subtype of non-neuronal glial cell types (astrocyte, p < 6.40 × 10–11, oligodendrocyte, p < 1.31 × 10–09). Conclusion Among the ASD genes enriched with pathogenic de novo LOF variants (i.e. KANK1, PLXNB1), a subgroup has restricted transcriptional regulation in non-neuronal cell types that are evolutionarily conserved. This association strongly suggests the involvement of subtype of non-neuronal glial cells in the pathogenesis of ASD and the need to explore other biological pathways for this disorder.


2021 ◽  
Vol 13 (594) ◽  
pp. eabc1739
Author(s):  
Amanda Koire ◽  
Panagiotis Katsonis ◽  
Young Won Kim ◽  
Christie Buchovecky ◽  
Stephen J. Wilson ◽  
...  

Genotype-phenotype relationships shape health and population fitness but remain difficult to predict and interpret. Here, we apply an evolutionary action method to de novo missense variants in whole-exome sequences of individuals with autism spectrum disorder (ASD) to unravel genes and pathways connected to ASD. Evolutionary action predicts the impact of missense variants on protein function by measuring the fitness effect based on phylogenetic distances and substitution odds in homologous gene sequences. By examining de novo missense variants in 2384 individuals with ASD (probands) compared to matched siblings without ASD, we found missense variants in 398 genes representing 23 pathways that were biased toward higher evolutionary action scores than expected by random chance; these pathways were involved in axonogenesis, synaptic transmission, and neurodevelopment. The predicted fitness impact of de novo and inherited missense variants in candidate genes correlated with the IQ of individuals with ASD, even for new gene candidates. Taking an evolutionary action method, we detected those missense variants most likely to contribute to ASD pathogenesis and elucidated their phenotypic impact. This approach could be applied to integrate missense variants across a patient cohort to identify genes contributing to a shared phenotype in other complex diseases.


2021 ◽  
Author(s):  
Nasna Nassir ◽  
Asma Bankapur ◽  
Bisan Samara ◽  
Abdulrahman Ali ◽  
Awab Ahmed ◽  
...  

Abstract Background In recent years, several hundred autism spectrum disorder (ASD) implicated genes have been discovered impacting a wide range of molecular pathways. However, the molecular underpinning of ASD, particularly from the point of view of ‘brain to behaviour’ pathogenic mechanisms, remains largely unknown. Methods We undertook a study to investigate patterns of spatiotemporal and cell type expression of ASD-implicated genes by integrating large-scale brain single cell transcriptomes (> million cells) and de novo loss of function (LOF) ASD mutations (impacting 852 genes from 40122 cases). Results We identified multiple single cell clusters from three distinct developmental human brain regions (anterior cingulate cortex, middle temporal gyrus and primary visual cortex) that evidenced high evolutionary constraint through enrichment for brain critical exons and high PLi genes. These clusters also showed significant enrichment with ASD loss of function mutation genes (p < 5.23 x 10− 11) that are transcriptionally highly active in prenatal brain regions (visual cortex and dorsolateral prefrontal cortex). Mapping ASD de novo LOF mutated genes into large scale human and mouse brain single cell transcriptome analysis demonstrate enrichment of such genes into neuronal subtypes and are also enriched for subtype of non-neuronal glial cell types (astrocyte, p < 6.40 x 10− 11; oligodendrocyte, p < 1.31 x 10− 09). Conclusion Among the ASD genes enriched with pathogenic de novo LOF mutations (i.e., KANK1, PLXNB1), a subgroup has restricted transcriptional regulation in non-neuronal cell types that are evolutionarily conserved. This association strongly suggests the involvement of subtype of non-neuronal glial cells in the pathogenesis of ASD, and the need to explore other biological pathways for this disorder.


2020 ◽  
Vol 46 (1) ◽  
pp. 55-69 ◽  
Author(s):  
Veronica B. Searles Quick ◽  
Belinda Wang ◽  
Matthew W. State

Abstract“Big data” approaches in the form of large-scale human genomic studies have led to striking advances in autism spectrum disorder (ASD) genetics. Similar to many other psychiatric syndromes, advances in genotyping technology, allowing for inexpensive genome-wide assays, has confirmed the contribution of polygenic inheritance involving common alleles of small effect, a handful of which have now been definitively identified. However, the past decade of gene discovery in ASD has been most notable for the application, in large family-based cohorts, of high-density microarray studies of submicroscopic chromosomal structure as well as high-throughput DNA sequencing—leading to the identification of an increasingly long list of risk regions and genes disrupted by rare, de novo germline mutations of large effect. This genomic architecture offers particular advantages for the illumination of biological mechanisms but also presents distinctive challenges. While the tremendous locus heterogeneity and functional pleiotropy associated with the more than 100 identified ASD-risk genes and regions is daunting, a growing armamentarium of comprehensive, large, foundational -omics databases, across species and capturing developmental trajectories, are increasingly contributing to a deeper understanding of ASD pathology.


2021 ◽  
Vol 17 (3) ◽  
pp. e1008708
Author(s):  
Su Datt Lam ◽  
M. Madan Babu ◽  
Jonathan Lees ◽  
Christine A. Orengo

Alternative splicing can expand the diversity of proteomes. Homologous mutually exclusive exons (MXEs) originate from the same ancestral exon and result in polypeptides with similar structural properties but altered sequence. Why would some genes switch homologous exons and what are their biological impact? Here, we analyse the extent of sequence, structural and functional variability in MXEs and report the first large scale, structure-based analysis of the biological impact of MXE events from different genomes. MXE-specific residues tend to map to single domains, are highly enriched in surface exposed residues and cluster at or near protein functional sites. Thus, MXE events are likely to maintain the protein fold, but alter specificity and selectivity of protein function. This comprehensive resource of MXE events and their annotations is available at: http://gene3d.biochem.ucl.ac.uk/mxemod/. These findings highlight how small, but significant changes at critical positions on a protein surface are exploited in evolution to alter function.


2018 ◽  
Author(s):  
F. Kyle Satterstrom ◽  
Jack A. Kosmicki ◽  
Jiebiao Wang ◽  
Michael S. Breen ◽  
Silvia De Rubeis ◽  
...  

SummaryWe present the largest exome sequencing study of autism spectrum disorder (ASD) to date (n=35,584 total samples, 11,986 with ASD). Using an enhanced Bayesian framework to integrate de novo and case-control rare variation, we identify 102 risk genes at a false discovery rate ≤ 0.1. Of these genes, 49 show higher frequencies of disruptive de novo variants in individuals ascertained for severe neurodevelopmental delay, while 53 show higher frequencies in individuals ascertained for ASD; comparing ASD cases with mutations in these groups reveals phenotypic differences. Expressed early in brain development, most of the risk genes have roles in regulation of gene expression or neuronal communication (i.e., mutations effect neurodevelopmental and neurophysiological changes), and 13 fall within loci recurrently hit by copy number variants. In human cortex single-cell gene expression data, expression of risk genes is enriched in both excitatory and inhibitory neuronal lineages, consistent with multiple paths to an excitatory/inhibitory imbalance underlying ASD.


Sign in / Sign up

Export Citation Format

Share Document