DECO: a framework for jointly analyzing de novo and rare case/control variants, and biological pathways

Author(s):  
Tan-Hoang Nguyen ◽  
Xin He ◽  
Ruth C Brown ◽  
Bradley T Webb ◽  
Kenneth S Kendler ◽  
...  

Abstract Motivation: Rare variant-based analyses are beginning to identify risk genes for neuropsychiatric disorders and other diseases. However, the identified genes only account for a fraction of predicted causal genes. Recent studies have shown that rare damaging variants are significantly enriched in specific gene-sets. Methods which are able to jointly model rare variants and gene-sets to identify enriched gene-sets and use these enriched gene-sets to prioritize additional risk genes could improve understanding of the genetic architecture of diseases. Results: We propose DECO (Integrated analysis of de novo mutations, rare case/control variants and omics information via gene-sets), an integrated method for rare-variant and gene-set analysis. The method can (i) test the enrichment of gene-sets directly within the statistical model, and (ii) use enriched gene-sets to rank existing genes and prioritize additional risk genes for tested disorders. In simulations, DECO performs better than a homologous method that uses only variant data. To demonstrate the application of the proposed protocol, we have applied this approach to rare-variant datasets of schizophrenia. Compared with a method which only uses variant information, DECO is able to prioritize additional risk genes. Availability: DECO can be used to analyze rare-variants and biological pathways or cell types for any disease. The package is available on Github https://github.com/hoangtn/DECO.

2018 ◽  
Author(s):  
Hoang T. Nguyen ◽  
Amanda Dobbyn ◽  
Alexander W. Charney ◽  
Julien Bryois ◽  
April Kim ◽  
...  

AbstractTrio family and case-control studies of next-generation sequencing data have proven integral to understanding the contribution of rare inherited and de novo single-nucleotide variants to the genetic architecture of complex disease. Ideally, such studies should identify individual risk genes of moderate to large effect size to generate novel treatment hypotheses for further follow-up. However, due to insufficient power, gene set enrichment analyses have come to be relied upon for detecting differences between cases and controls, implicating sets of hundreds of genes rather than specific targets for further investigation. Here, we present a Bayesian statistical framework, termed gTADA, that integrates gene-set membership information with gene-level de novo and rare inherited case-control counts, to prioritize risk genes with excess rare variant burden within enriched gene sets. Applying gTADA to available whole-exome sequencing datasets for several neuropsychiatric conditions, we replicated previously reported gene set enrichments and identified novel risk genes. For epilepsy, gTADA prioritized 40 risk genes (posterior probabilities > 0.95), 6 of which replicate in an independent whole-genome sequencing study. In addition, 30/40 genes are novel genes. We found that epilepsy genes had high protein-protein interaction (PPI) network connectivity, and show specific expression during human brain development. Some of the top prioritized EPI genes were connected to a PPI subnetwork of immune genes and show specific expression in prenatal microglia. We also identified multiple enriched drug-target gene sets for EPI which included immunostimulants as well as known antiepileptics. Immune biology was supported specifically by case-control variants from familial epilepsies rather than do novo mutations in generalized encephalitic epilepsy.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Na Zhu ◽  
◽  
Emilia M. Swietlik ◽  
Carrie L. Welch ◽  
Michael W. Pauciulo ◽  
...  

Abstract Background Pulmonary arterial hypertension (PAH) is a lethal vasculopathy characterized by pathogenic remodeling of pulmonary arterioles leading to increased pulmonary pressures, right ventricular hypertrophy, and heart failure. PAH can be associated with other diseases (APAH: connective tissue diseases, congenital heart disease, and others) but often the etiology is idiopathic (IPAH). Mutations in bone morphogenetic protein receptor 2 (BMPR2) are the cause of most heritable cases but the vast majority of other cases are genetically undefined. Methods To identify new risk genes, we utilized an international consortium of 4241 PAH cases with exome or genome sequencing data from the National Biological Sample and Data Repository for PAH, Columbia University Irving Medical Center, and the UK NIHR BioResource – Rare Diseases Study. The strength of this combined cohort is a doubling of the number of IPAH cases compared to either national cohort alone. We identified protein-coding variants and performed rare variant association analyses in unrelated participants of European ancestry, including 1647 IPAH cases and 18,819 controls. We also analyzed de novo variants in 124 pediatric trios enriched for IPAH and APAH-CHD. Results Seven genes with rare deleterious variants were associated with IPAH with false discovery rate smaller than 0.1: three known genes (BMPR2, GDF2, and TBX4), two recently identified candidate genes (SOX17, KDR), and two new candidate genes (fibulin 2, FBLN2; platelet-derived growth factor D, PDGFD). The new genes were identified based solely on rare deleterious missense variants, a variant type that could not be adequately assessed in either cohort alone. The candidate genes exhibit expression patterns in lung and heart similar to that of known PAH risk genes, and most variants occur in conserved protein domains. For pediatric PAH, predicted deleterious de novo variants exhibited a significant burden compared to the background mutation rate (2.45×, p = 2.5e−5). At least eight novel pediatric candidate genes carrying de novo variants have plausible roles in lung/heart development. Conclusions Rare variant analysis of a large international consortium identified two new candidate genes—FBLN2 and PDGFD. The new genes have known functions in vasculogenesis and remodeling. Trio analysis predicted that ~ 15% of pediatric IPAH may be explained by de novo variants.


2021 ◽  
Author(s):  
Lu Qiao ◽  
Le Xu ◽  
Lan Yu ◽  
Julia Wynn ◽  
Rebecca Hernan ◽  
...  

Congenital diaphragmatic hernia (CDH) is a severe congenital anomaly that is often accompanied by other anomalies. Although the role of genetics in the pathogenesis of CDH has been established, only a small number of disease genes have been identified. To further investigate the genetics of CDH, we analyzed de novo coding variants in 827 proband-parent trios and confirmed an overall significant enrichment of damaging de novo variants, especially in constrained genes. We identified LONP1 (Lon Peptidase 1, Mitochondrial) and ALYREF (Aly/REF Export Factor) as novel candidate CDH genes based on de novo variants at a false discovery rate below 0.05. We also performed ultra-rare variant association analyses in 748 cases and 11,220 ancestry-matched population controls and identified LONP1 as a risk gene contributing to CDH through both de novo and ultra-rare inherited largely heterozygous variants clustered in the core of the domains and segregating with CDH in familial cases. Approximately 3% of our CDH cohort was heterozygous with ultra-rare predicted damaging variants in LONP1 who have a range of clinical phenotypes including other anomalies in some individuals and higher mortality and requirement for extracorporeal membrane oxygenation. Mice with lung epithelium specific deletion of Lonp1 die immediately after birth and have reduced lung growth and branching that may at least partially explain the high mortality in humans. Our findings of both de novo and inherited rare variants in the same gene may have implications in the design and analysis for other genetic studies of congenital anomalies.


2020 ◽  
Author(s):  
Roozbeh Manshaei ◽  
Daniele Merico ◽  
Miriam S. Reuter ◽  
Worrawat Engchuan ◽  
Bahareh A. Mojarad ◽  
...  

AbstractRecent genome-wide studies of rare genetic variants have begun to implicate novel mechanisms for tetralogy of Fallot (TOF), a severe congenital heart defect (CHD).To provide statistical support for case-only data without parental genomes, we re-analyzed genome sequences of 231 individuals with TOF or related CHD. We adapted a burden test originally developed for de novo variants to assess singleton variant burden in individual genes, and in gene-sets corresponding to functional pathways and mouse phenotypes, accounting for highly correlated gene-sets, and for multiple testing.The gene burden test identified a significant burden of deleterious missense variants in NOTCH1 (Bonferroni-corrected p-value <0.01). These NOTCH1 variants showed significant enrichment for those affecting the extracellular domain, and especially for disruption of cysteine residues forming disulfide bonds (OR 39.8 vs gnomAD). Individuals with NOTCH1 variants, all with TOF, were enriched for positive family history of CHD. Other genes not previously implicated in TOF had more modest statistical support and singleton missense variant results were non-significant for gene-set burden. For singleton truncating variants, the gene burden test confirmed significant burden in FLT4. Gene-set burden tests identified a cluster of pathways corresponding to VEGF signaling (FDR=0%), and of mouse phenotypes corresponding to abnormal vasculature (FDR=0.8%), that suggested additional candidate genes not previously identified (e.g., WNT5A and ZFAND5). Analyses using unrelated sequencing datasets supported specificity of the findings for CHD.The findings support the importance of ultra-rare variants disrupting genes involved in VEGF and NOTCH signaling in the genetic architecture of TOF. These proof-of-principle data indicate that this statistical methodology could assist in analyzing case-only sequencing data in which ultra-rare variants, whether de novo or inherited, contribute to the genetic etiopathogenesis of a complex disorder.Author summaryWe analyzed the ultra-rare nonsynonymous variant burden for genome sequencing data from 231 individuals with congenital heart defects, most with tetralogy of Fallot. We adapted a burden test originally developed for de novo variants. In line with other studies, we identified a significant truncating variant burden for FLT4 and deleterious missense burden for NOTCH1, both passing a stringent Bonferroni multiple-test correction. For NOTCH1, we observed frequent disruption of cysteine residues establishing disulfide bonds in the extracellular domain. We also identified genes with BH-FDR <10% that were not previously implicated. To overcome limited power for individual genes, we tested gene-sets corresponding to functional pathways and mouse phenotypes. Gene-set burden of truncating variants was significant for vascular endothelial growth factor signaling and abnormal vasculature phenotypes. These results confirmed previous findings and suggested additional candidate genes for experimental validation in future studies. This methodology can be extended to other case-only sequencing data in which ultra-rare variants make a substantial contribution to genetic etiology.


2019 ◽  
Author(s):  
Shengtong Han ◽  
Nicholas Knoblauch ◽  
Gao Wang ◽  
Siming Zhao ◽  
Yuwen Liu ◽  
...  

AbstractRare genetic variants make significant contributions to human diseases. Compared to common variants, rare variants have larger effect sizes and are generally free of linkage disequilibrium (LD), which makes it easier to identify causal variants. Numerous methods have been developed to analyze rare variants in a gene or region in association studies, with the goal of finding risk genes by aggregating information of all variants of a gene. These methods, however, often make unrealistic assumptions, e.g. all rare variants in a risk gene would have non-zero effects. In practice, current methods for gene-based analysis often fail to show any advantage over simple single-variant analysis. In this work, we develop a Bayesian method: MIxture model based Rare variant Analysis on GEnes (MIRAGE). MIRAGE captures the heterogeneity of variant effects by treating all variants of a gene as a mixture of risk and non-risk variants, and models the prior probabilities of being risk variants as function of external information of variants, such as allele frequencies and predicted deleterious effects. MIRAGE uses an empirical Bayes approach to estimate these prior probabilities by combining information across genes. We demonstrate in both simulations and analysis of an exome-sequencing dataset of Autism, that MIRAGE significantly outperforms current methods for rare variant analysis. In particular, the top genes identified by MIRAGE are highly enriched with known or plausible Autism risk genes. Our results highlight several novel Autism genes with high Bayesian posterior probabilities and functional connections with Autism. MIRAGE is available at https://xinhe-lab.github.io/mirage.


2020 ◽  
Author(s):  
Todd Lencz ◽  
Jin Yu ◽  
Raiyan Rashid Khan ◽  
Shai Carmi ◽  
Max Lam ◽  
...  

AbstractIMPORTANCESchizophrenia is a serious mental illness with high heritability. While common genetic variants account for a portion of the heritability, identification of rare variants associated with the disorder has proven challenging.OBJECTIVETo identify genes and gene sets associated with schizophrenia in a founder population (Ashkenazi Jewish), and to determine the relative power of this population for rare variant discovery.DESIGN, SETTING, AND PARTICIPANTSData on exonic variants were extracted from whole genome sequences drawn from 786 patients with schizophrenia and 463 healthy control subjects, all drawn from the Ashkenazi Jewish population. Variants observed in two large publicly available datasets (total n≈153,000, excluding neuropsychiatric patients) were filtered out, and novel ultra-rare variants (URVs) were compared in cases and controls.MAIN OUTCOMES AND MEASURESThe number of novel URVs and genes carrying them were compared across cases and controls. Genes in which only cases or only controls carried novel, functional URVs were examined using gene set analyses.RESULTSCases had a higher frequency of novel missense or loss of function (MisLoF) variants compared to controls, as well as a greater number of genes impacted by MisLoF variants. Characterizing 141 “case-only” genes (in which ≥ 3 AJ cases in our dataset had MisLoF URVs with none found in our AJ controls), we replicated prior findings of both enrichment for synaptic gene sets, as well as specific genes such as SETD1A and TRIO. Additionally, we identified cadherins as a novel gene set associated with schizophrenia including a recurrent mutation in PCDHA3. Several genes associated with autism and other neurodevelopmental disorders including CACNA1E, ASXL3, SETBP1, and WDFY3, were also identified in our case-only gene list, as was TSC2, which is linked to tuberous sclerosis. Modeling the effects of purifying selection demonstrated that deleterious rare variants are greatly over-represented in a founder population with a tight bottleneck and rapidly expanding census, resulting in enhanced power for rare variant association studies.CONCLUSIONS AND RELEVANCEIdentification of cell adhesion genes in the cadherin/protocadherin family is consistent with evidence from large-scale GWAS in schizophrenia, helps specify the synaptic abnormalities that may be central to the disorder, and suggests novel potential treatment strategies (e.g., inhibition of protein kinase C). Study of founder populations may serve as a cost-effective way to rapidly increase gene discovery in schizophrenia and other complex disorders.


Author(s):  
Na Zhu ◽  
Emilia M. Swietlik ◽  
Carrie L. Welch ◽  
Michael W. Pauciulo ◽  
Jacob J. Hagen ◽  
...  

AbstractBackgroundGroup 1 pulmonary arterial hypertension (PAH) is a lethal vasculopathy characterized by pathogenic remodeling of pulmonary arterioles leading to increased pulmonary pressures, right ventricular hypertrophy and heart failure. Recent high-throughput sequencing studies have identified additional PAH risk genes and suggested differences in genetic causes by age of onset. However, known risk genes explain only 15-20% of non-familial idiopathic PAH cases.MethodsTo identify new risk genes, we utilized an international consortium of 4,241 PAH cases with 4,175 sequenced exomes (n=2,572 National Biological Sample and Data Repository for PAH; n=469 Columbia University Irving Medical Center, enriched for pediatric trios) and 1,134 sequenced genomes (UK NIHR Bioresource – Rare Diseases Study). Most of the cases were adult-onset disease (93%), and 55% idiopathic (IPAH) and 35% associated with other diseases (APAH). We identified protein-coding variants and performed rare variant association analyses in unrelated participants of European ancestry, including 2,789 cases and 18,819 controls (11,101 unaffected parents from the Simons Powering Autism Research for Knowledge study and 7,718 gnomAD individuals). We analyzed de novo variants in 124 pediatric trios.ResultsSeven genes with rare deleterious variants were significantly associated (false discovery rate <0.1) with IPAH, including three known genes (BMPR2, GDF2, and TBX4), two recently identified candidate genes (SOX17, KDR), and two new candidate genes (FBLN2, fibulin 2; PDGFD, platelet-derived growth factor D). The candidate genes exhibit expression patterns in lung and heart similar to that of known PAH risk genes, and most of the variants occur in conserved protein domains. Variants in known PAH gene, ACVRL1, showed association with APAH. Predicted deleterious de novo variants in pediatric cases exhibited a significant burden compared to the background mutation rate (2.5x, p=7.0E-6). At least eight novel candidate genes carrying de novo variants have plausible roles in lung/heart development.ConclusionsRare variant analysis of a large international consortium identifies two new candidate genes - FBLN2 and PDGFD. The new genes have known functions in vasculogenesis and remodeling but have not been previously implicated in PAH. Trio analysis predicts that ~15% of pediatric IPAH may be explained by de novo variants.


2018 ◽  
Vol 138 (12) ◽  
pp. 2674-2677 ◽  
Author(s):  
Manuela Pigors ◽  
John E.A. Common ◽  
Xuan Fei Colin C. Wong ◽  
Sajid Malik ◽  
Claire A. Scott ◽  
...  

2016 ◽  
Author(s):  
Varun Warrier ◽  
Richard AI Bethlehem ◽  
Daniel H Geschwind ◽  
Simon Baron-Cohen

AbstractImportanceThe genetic relationship between cognition, autism, and schizophrenia is complex. It is unclear how genes that contribute to cognition also contribute to risk for autism and schizophrenia.ObjectiveTo investigate the interaction between genes related to cognition (measured via proxy through educational attainment, which we call ‘edu genes’) and genes/biological pathways that are atypical in autism and schizophrenia.DesignGenetic correlation and enrichment analysis were conducted to identify the interaction between edu genes and risk genes and biological pathways for autism or schizophrenia.ResultsFirst, edu genes are enriched in a specific developmental co-expression module that is also enriched for high confidence autism risk genes. Second, modules enriched for genes that are dysregulated in autism and schizophrenia are also enriched for edu genes. Finally, genes that overlap between the two above modules and educational attainment are significantly enriched for genes that flank human accelerated regions, suggesting increased positive selection for the overlapping gene sets.ConclusionOur results identify distinct co-expression modules where risk genes for the two psychiatric conditions interact with edu genes. This suggests specific pathways that contribute to both cognitive deficits and cognitive talents, in individuals with schizophrenia or autism.Key PointsQuestionHow do genes for educational attainment interact with risk genes for autism and schizophrenia?FindingsWe show that genes for educational attainment (edu genes) are significantly likely to be mutated in autism and intellectual disability. We further show that edu genes also interact with co-expression modules that are associated with autism or schizophrenia and are enriched for differentially expressed genes in autism or schizophrenia. Finally, we identify that the enrichment between risk genes for autism and schizophrenia and human accelerated regions are driven, in part, by their overlap with edu genes.MeaningEdu genes interact with schizophrenia and autism risk genes in specific pathways, contributing to both cognitive deficits and talents.


2021 ◽  
pp. 1-14
Author(s):  
A. Havdahl ◽  
M. Niarchou ◽  
A. Starnawska ◽  
M. Uddin ◽  
C. van der Merwe ◽  
...  

Abstract Autism spectrum disorder (autism) is a heterogeneous group of neurodevelopmental conditions characterized by early childhood-onset impairments in communication and social interaction alongside restricted and repetitive behaviors and interests. This review summarizes recent developments in human genetics research in autism, complemented by epigenetic and transcriptomic findings. The clinical heterogeneity of autism is mirrored by a complex genetic architecture involving several types of common and rare variants, ranging from point mutations to large copy number variants, and either inherited or spontaneous (de novo). More than 100 risk genes have been implicated by rare, often de novo, potentially damaging mutations in highly constrained genes. These account for substantial individual risk but a small proportion of the population risk. In contrast, most of the genetic risk is attributable to common inherited variants acting en masse, each individually with small effects. Studies have identified a handful of robustly associated common variants. Different risk genes converge on the same mechanisms, such as gene regulation and synaptic connectivity. These mechanisms are also implicated by genes that are epigenetically and transcriptionally dysregulated in autism. Major challenges to understanding the biological mechanisms include substantial phenotypic heterogeneity, large locus heterogeneity, variable penetrance, and widespread pleiotropy. Considerable increases in sample sizes are needed to better understand the hundreds or thousands of common and rare genetic variants involved. Future research should integrate common and rare variant research, multi-omics data including genomics, epigenomics, and transcriptomics, and refined phenotype assessment with multidimensional and longitudinal measures.


Sign in / Sign up

Export Citation Format

Share Document