scholarly journals Identifying Crohn’s disease signal from variome analysis

2017 ◽  
Author(s):  
Yanran Wang ◽  
Yuri Astrakhan ◽  
Britt-Sabina Petersen ◽  
Stefan Schreiber ◽  
Andre Franke ◽  
...  

AbstractBackgroundAfter many years of concentrated research efforts, the exact cause of Crohn’s disease remains unknown. Its accurate diagnosis, however, helps in management and even preventing the onset of disease. Genome-wide association studies have identified 140 loci associated with CD, but these carry very small log odds ratios and are uninformative for diagnoses.ResultsHere we describe a machine learning method – AVA,Dx (Analysis of Variation for Association with Disease) – that uses whole exome sequencing data to make predictions of CD status. Using the person-specific variation in these genes from a panel of only 111 individuals, we built disease-prediction models informative of previously undiscovered disease genes. In this panel, our models differentiate CD patients from healthy controls with 71% precision and 73% recall at the default cutoff. By additionally accounting for batch effects, we are also able to predict individual CD status for previously unseen individuals from a separate CD study (84% precision, 73% recall).ConclusionsLarger training panels and additional features, including regulatory variants and environmental factors, e.g. human-associated microbiota, are expected to improve model performance. However, current results already position AVA,Dx as both an effective method for highlighting pathogenesis pathways and as a simple Crohn’s disease risk analysis tool, which can improve clinical diagnostic time and accuracy.

2019 ◽  
Vol 11 (1) ◽  
Author(s):  
Yanran Wang ◽  
Maximilian Miller ◽  
Yuri Astrakhan ◽  
Britt-Sabina Petersen ◽  
Stefan Schreiber ◽  
...  

Abstract Background After years of concentrated research efforts, the exact cause of Crohn’s disease (CD) remains unknown. Its accurate diagnosis, however, helps in management and preventing the onset of disease. Genome-wide association studies have identified 241 CD loci, but these carry small log odds ratios and are thus diagnostically uninformative. Methods Here, we describe a machine learning method—AVA,Dx (Analysis of Variation for Association with Disease)—that uses exonic variants from whole exome or genome sequencing data to extract CD signal and predict CD status. Using the person-specific coding variation in genes from a panel of only 111 individuals, we built disease-prediction models informative of previously undiscovered disease genes. By additionally accounting for batch effects, we were able to accurately predict CD status for thousands of previously unseen individuals from other panels. Results AVA,Dx highlighted known CD genes including NOD2 and new potential CD genes. AVA,Dx identified 16% (at strict cutoff) of CD patients at 99% precision and 58% of the patients (at default cutoff) with 82% precision in over 3000 individuals from separately sequenced panels. Conclusions Larger training panels and additional features, including other types of genetic variants and environmental factors, e.g., human-associated microbiota, may improve model performance. However, the results presented here already position AVA,Dx as both an effective method for revealing pathogenesis pathways and as a CD risk analysis tool, which can improve clinical diagnostic time and accuracy. Links to the AVA,Dx Docker image and the BitBucket source code are at https://bromberglab.org/project/avadx/.


2018 ◽  
Vol 13 (5) ◽  
pp. 648-658 ◽  
Author(s):  
Yoichi Kakuta ◽  
Yosuke Kawai ◽  
Takeo Naito ◽  
Atsushi Hirano ◽  
Junji Umeno ◽  
...  

Abstract Background and Aims Genome-wide association studies [GWASs] of European populations have identified numerous susceptibility loci for Crohn’s disease [CD]. Susceptibility genes differ by ethnicity, however, so GWASs specific for Asian populations are required. This study aimed to clarify the Japanese-specific genetic background for CD by a GWAS using the Japonica array [JPA] and subsequent imputation with the 1KJPN reference panel. Methods Two independent Japanese case/control sets (Tohoku region [379 CD patients, 1621 controls] and Kyushu region [334 CD patients, 462 controls]) were included. GWASs were performed separately for each population, followed by a meta-analysis. Two additional replication sets [254 + 516 CD patients and 287 + 565 controls] were analysed for top hit single nucleotide polymorphisms [SNPs] from novel genomic regions. Results Genotype data of 4 335 144 SNPs from 713 Japanese CD patients and 2083 controls were analysed. SNPs located in TNFSF15 (rs78898421, Pmeta = 2.59 × 10−26, odds ratio [OR] = 2.10), HLA-DQB1 [rs184950714, pmeta = 3.56 × 10−19, OR = 2.05], ZNF365, and 4p14 loci were significantly associated with CD in Japanese individuals. Replication analyses were performed for four novel candidate loci [p <1 × 10−6], and rs488200 located upstream of RAP1A was significantly associated with CD [pcombined = 4.36 × 10−8, OR = 1.31]. Transcriptome analysis of CD4+ effector memory T cells from lamina propria mononuclear cells of CD patients revealed a significant association of rs488200 with RAP1A expression. Conclusions RAP1A is a novel susceptibility locus for CD in the Japanese population.


2015 ◽  
Vol 22 (4) ◽  
pp. 545-559 ◽  
Author(s):  
Rafael Ríos ◽  
Carmen Belén Lupiañez ◽  
Daniele Campa ◽  
Alessandro Martino ◽  
Joaquin Martínez-López ◽  
...  

Type 2 diabetes (T2D) has been suggested to be a risk factor for multiple myeloma (MM), but the relationship between the two traits is still not well understood. The aims of this study were to evaluate whether 58 genome-wide-association-studies (GWAS)-identified common variants for T2D influence the risk of developing MM and to determine whether predictive models built with these variants might help to predict the disease risk. We conducted a case–control study including 1420 MM patients and 1858 controls ascertained through the International Multiple Myeloma (IMMEnSE) consortium. Subjects carrying the KCNQ1rs2237892T allele or the CDKN2A-2Brs2383208G/G, IGF1rs35767T/T and MADDrs7944584T/T genotypes had a significantly increased risk of MM (odds ratio (OR)=1.32–2.13) whereas those carrying the KCNJ11rs5215C, KCNJ11rs5219T and THADArs7578597C alleles or the FTOrs8050136A/A and LTArs1041981C/C genotypes showed a significantly decreased risk of developing the disease (OR=0.76–0.85). Interestingly, a prediction model including those T2D-related variants associated with the risk of MM showed a significantly improved discriminatory ability to predict the disease when compared to a model without genetic information (area under the curve (AUC)=0.645 vs AUC=0.629; P=4.05×10−06). A gender-stratified analysis also revealed a significant gender effect modification for ADAM30rs2641348 and NOTCH2rs10923931 variants (Pinteraction=0.001 and 0.0004, respectively). Men carrying the ADAM30rs2641348C and NOTCH2rs10923931T alleles had a significantly decreased risk of MM whereas an opposite but not significant effect was observed in women (ORM=0.71 and ORM=0.66 vs ORW=1.22 and ORW=1.15, respectively). These results suggest that TD2-related variants may influence the risk of developing MM and their genotyping might help to improve MM risk prediction models.


2010 ◽  
Vol 128 (2) ◽  
pp. 131-135 ◽  
Author(s):  
Devendra K. Amre ◽  
David R. Mack ◽  
Kenneth Morgan ◽  
David Israel ◽  
Colette Deslandres ◽  
...  

2015 ◽  
Author(s):  
Oriol Canela-Xandri ◽  
Konrad Rawlik ◽  
John A. Woolliams ◽  
Albert Tenesa

Genome-wide association studies (GWAS) promised to translate their findings into clinically beneficial improvements of patient management by tailoring disease management to the individual through the prediction of disease risk. However, the ability to translate genetic findings from GWAS into predictive tools that are of clinical utility and which may inform clinical practice has, so far, been encouraging but limited. Here we propose to use a more powerful statistical approach that enables the prediction of multiple medically relevant phenotypes without the costs associated with developing a genetic test for each of them. As a proof of principle, we used a common panel of 319,038 SNPs to train the prediction models in 114,264 unrelated White-British for height and four obesity related traits (body mass index, basal metabolic rate, body fat percentage, and waist-to-hip ratio). We obtained prediction accuracies that ranged between 46% and 75% of the maximum achievable given their explained heritable component. This represents an improvement of up to 75% over the phenotypic variance explained by the predictors developed through large collaborations, which used more than twice as many training samples. Across-population predictions in White non-British individuals were similar to those of White-British whilst those in Asian and Black individuals were informative but less accurate. The genotyping of circa 500,000 UK Biobank participants will yield predictions ranging between 66% and 83% of the maximum. We anticipate that our models and a common panel of genetic markers, which can be used across multiple traits and diseases, will be the starting point to tailor disease management to the individual. Ultimately, we will be able to capitalise on whole-genome sequence and environmental risk factors to realise the full potential of genomic medicine.


2021 ◽  
Author(s):  
Steven Gazal ◽  
Omer Weissbrod ◽  
Farhad Hormozdiari ◽  
Kushal Dey ◽  
Joseph Nasser ◽  
...  

Although genome-wide association studies (GWAS) have identified thousands of disease-associated common SNPs, these SNPs generally do not implicate the underlying target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis, but it is unclear how these strategies should be applied in the context of interpreting common disease risk variants. We developed a framework for evaluating and combining different S2G strategies to optimize their informativeness for common disease risk, leveraging polygenic analyses of disease heritability to define and estimate their precision and recall. We applied our framework to GWAS summary statistics for 63 diseases and complex traits (average N=314K), evaluating 50 S2G strategies. Our optimal combined S2G strategy (cS2G) included 7 constituent S2G strategies (Exon, Promoter, 2 fine-mapped cis-eQTL strategies, EpiMap enhancer-gene linking, Activity-By-Contact (ABC), and Cicero), and achieved a precision of 0.75 and a recall of 0.33, more than doubling the precision and/or recall of any individual strategy; this implies that 33% of SNP-heritability can be linked to causal genes with 75% confidence. We applied cS2G to fine-mapping results for 49 UK Biobank diseases/traits to predict 7,111 causal SNP-gene-disease triplets (with S2G-derived functional interpretation) with high confidence. Finally, we applied cS2G to genome-wide fine-mapping results for these traits (not restricted to GWAS loci) to rank genes by the heritability linked to each gene, providing an empirical assessment of disease omnigenicity; averaging across traits, we determined that the top 200 (1%) of ranked genes explained roughly half of the heritability linked to all genes. Our results highlight the benefits of our cS2G strategy in providing functional interpretation of GWAS findings; we anticipate that precision and recall will increase further under our framework as improved functional assays lead to improved S2G strategies. 


2021 ◽  
Author(s):  
Matt Kanke ◽  
Meaghan M. Kennedy ◽  
Sean Connelly ◽  
Matthew Schaner ◽  
Michael T. Shanahan ◽  
...  

AbstractThe intestinal epithelial barrier is comprised of a monolayer of specialized intestinal epithelial cells (IECs) that are critical in maintaining gut mucosal homeostasis. Dysfunction within various IEC fractions can increase intestinal permeability, resulting in a chronic and debilitating condition known as Crohn’s disease (CD). Defining the molecular changes in each IEC type in CD will contribute to an improved understanding of the pathogenic processes and the identification of potential therapeutic targets. Here we performed, for the first time at single-cell resolution, a direct comparison of the colonic epithelial cellular and molecular landscape between treatment-naïve adult CD and non-IBD control patients. Our analysis revealed that in CD patients there is a significant skew in the colonic epithelial cellular distribution away from canonical LGR5+ stem cells, located at the crypt-bottom, and toward one specific subtype of mature colonocytes, located at the crypt-top. Further analysis revealed unique changes to gene expression programs in every major cell type, including a previously undescribed suppression in CD of most enteroendocrine driver genes as well as L-cell markers including GCG. We also dissect a previously poorly understood SPIB+ cell cluster, revealing at least four sub-clusters that exhibit unique features. One of these SPIB+ sub-clusters expresses crypt-top colonocyte markers and is significantly up-regulated in CD, whereas another sub-cluster strongly expresses and stains positive for lysozyme (albeit no other canonical Paneth cell marker), which surprisingly is greatly reduced in expression in CD. Finally, through integration with data from genome-wide association studies, we show that genes implicated in CD risk exhibit heretofore unknown cell-type specific patterns of aberrant expression in CD, providing unprecedented insight into the potential biological functions of these genes.


Sign in / Sign up

Export Citation Format

Share Document