Retrospective Definition of Clostridioides difficile PCR Ribotypes on the Basis of Whole Genome Polymorphisms: A Proof of Principle Study

Manisha Goyal; Lysiane Hauben; Hannes Pouseele; Magali Jaillard; Katrien De Bruyne; Alex van Belkum; Richard Goering

doi:10.3390/diagnostics10121078

Retrospective Definition of Clostridioides difficile PCR Ribotypes on the Basis of Whole Genome Polymorphisms: A Proof of Principle Study

Diagnostics ◽

10.3390/diagnostics10121078 ◽

2020 ◽

Vol 10 (12) ◽

pp. 1078

Author(s):

Manisha Goyal ◽

Lysiane Hauben ◽

Hannes Pouseele ◽

Magali Jaillard ◽

Katrien De Bruyne ◽

...

Keyword(s):

In Silico ◽

Association Studies ◽

De Bruijn Graph ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Clostridioides Difficile ◽

Specificity And Sensitivity ◽

Intergenic Regions ◽

Pcr Ribotyping ◽

Proof Of Principle

Clostridioides difficile is a cause of health care-associated infections. The epidemiological study of C. difficile infection (CDI) traditionally involves PCR ribotyping. However, ribotyping will be increasingly replaced by whole genome sequencing (WGS). This implies that WGS types need correlation with classical ribotypes (RTs) in order to perform retrospective clinical studies. Here, we selected genomes of hyper-virulent C. difficile strains of RT001, RT017, RT027, RT078, and RT106 to try and identify new discriminatory markers using in silico ribotyping PCR and De Bruijn graph-based Genome Wide Association Studies (DBGWAS). First, in silico ribotyping PCR was performed using reference primer sequences and 30 C. difficile genomes of the five different RTs identified above. Second, discriminatory genomic markers were sought with DBGWAS using a set of 160 independent C. difficile genomes (14 ribotypes). RT-specific genetic polymorphisms were annotated and validated for their specificity and sensitivity against a larger dataset of 2425 C. difficile genomes covering 132 different RTs. In silico PCR ribotyping was unsuccessful due to non-specific or missing theoretical RT PCR fragments. More successfully, DBGWAS discovered a total of 47 new markers (13 in RT017, 12 in RT078, 9 in RT106, 7 in RT027, and 6 in RT001) with minimum q-values of 0 to 7.40 × 10−5, indicating excellent marker selectivity. The specificity and sensitivity of individual markers ranged between 0.92 and 1.0 but increased to 1 by combining two markers, hence providing undisputed RT identification based on a single genome sequence. Markers were scattered throughout the C. difficile genome in intra- and intergenic regions. We propose here a set of new genomic polymorphisms that efficiently identify five hyper-virulent RTs utilizing WGS data only. Further studies need to show whether this initial proof-of-principle observation can be extended to all 600 existing RTs.

Download Full-text

Family-based gene-environment interaction using sequence kernel association test (FGE-SKAT) for complex quantitative traits

Scientific Reports ◽

10.1038/s41598-021-86871-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chao-Yu Guo ◽

Reng-Hong Wang ◽

Hsin-Chou Yang

Keyword(s):

Complex Traits ◽

Association Studies ◽

Association Test ◽

Whole Genome Sequence ◽

Environment Interaction ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Sequence Kernel Association Test ◽

Gene Environment ◽

Family Based

AbstractAfter the genome-wide association studies (GWAS) era, whole-genome sequencing is highly engaged in identifying the association of complex traits with rare variations. A score-based variance-component test has been proposed to identify common and rare genetic variants associated with complex traits while quickly adjusting for covariates. Such kernel score statistic allows for familial dependencies and adjusts for random confounding effects. However, the etiology of complex traits may involve the effects of genetic and environmental factors and the complex interactions between genes and the environment. Therefore, in this research, a novel method is proposed to detect gene and gene-environment interactions in a complex family-based association study with various correlated structures. We also developed an R function for the Fast Gene-Environment Sequence Kernel Association Test (FGE-SKAT), which is freely available as supplementary material for easy GWAS implementation to unveil such family-based joint effects. Simulation studies confirmed the validity of the new strategy and the superior statistical power. The FGE-SKAT was applied to the whole genome sequence data provided by Genetic Analysis Workshop 18 (GAW18) and discovered concordant and discordant regions compared to the methods without considering gene by environment interactions.

Download Full-text

Integration of genome wide association studies and whole genome sequencing provides novel insights into fat deposition in chicken

Scientific Reports ◽

10.1038/s41598-018-34364-0 ◽

2018 ◽

Vol 8 (1) ◽

Cited By ~ 8

Author(s):

Gabriel Costa Monteiro Moreira ◽

Clarissa Boschiero ◽

Aline Silva Mello Cesar ◽

James M. Reecy ◽

Thaís Fernanda Godoy ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Association Studies ◽

Fat Deposition ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Genome Wide

Download Full-text

High Resolution Ancestry Deconvolution for Next Generation Genomic Data

10.1101/2021.09.19.460980 ◽

2021 ◽

Author(s):

Helgi Hilmarsson ◽

Arvind S. Kumar ◽

Richa Rastogi ◽

Carlos D. Bustamante ◽

Daniel Mas Montserrat ◽

...

Keyword(s):

High Resolution ◽

Prediction Models ◽

Association Studies ◽

Training Data ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Next Generation ◽

Genome Data ◽

Computational Performance ◽

Genetic Risk Prediction

ABSTRACTAs genome-wide association studies and genetic risk prediction models are extended to globally diverse and admixed cohorts, ancestry deconvolution has become an increasingly important tool. Also known as local ancestry inference (LAI), this technique identifies the ancestry of each region of an individual’s genome, thus permitting downstream analyses to account for genetic effects that vary between ancestries. Since existing LAI methods were developed before the rise of massive, whole genome biobanks, they are computationally burdened by these large next generation datasets. Current LAI algorithms also fail to harness the potential of whole genome sequences, falling well short of the accuracy that such high variant densities can enable. Here we introduce Gnomix, a set of algorithms that address each of these points, achieving higher accuracy and swifter computational performance than any existing LAI method, while also enabling portable models that are particularly useful when training data are not shareable due to privacy or other restrictions. We demonstrate Gnomix (and its swift phase correction counterpart Gnofix) on worldwide whole-genome data from both humans and canids and utilize its high resolution accuracy to identify the location of ancient New World haplotypes in the Xoloitzcuintle, dating back over 100 generations. Code is available at https://github.com/AI-sandbox/gnomix.

Download Full-text

CONQUER: an interactive toolbox to understand functional consequences of GWAS hits

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa085 ◽

2020 ◽

Vol 2 (4) ◽

Author(s):

Gerard A Bouland ◽

Joline W J Beulens ◽

Joey Nap ◽

Arno R van der Slik ◽

Arnaud Zaldumbide ◽

...

Keyword(s):

Risk Factors ◽

Association Studies ◽

R Package ◽

Integrative Approach ◽

Individual Risk ◽

Genome Wide Association Studies ◽

Genome Database ◽

Intergenic Regions ◽

Functional Consequences ◽

Individual Risk Factors

Abstract Numerous large genome-wide association studies have been performed to understand the influence of genetics on traits. Many identified risk loci are in non-coding and intergenic regions, which complicates understanding how genes and their downstream pathways are influenced. An integrative data approach is required to understand the mechanism and consequences of identified risk loci. Here, we developed the R-package CONQUER. Data for SNPs of interest are acquired from static- and dynamic repositories (build GRCh38/hg38), including GTExPortal, Epigenomics Project, 4D genome database and genome browsers. All visualizations are fully interactive so that the user can immediately access the underlying data. CONQUER is a user-friendly tool to perform an integrative approach on multiple SNPs where risk loci are not seen as individual risk factors but rather as a network of risk factors.

Download Full-text

Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa068 ◽

2020 ◽

Vol 27 (9) ◽

pp. 1425-1430

Author(s):

Inès Krissaane ◽

Carlos De Niz ◽

Alba Gutiérrez-Sacristán ◽

Gabor Korodi ◽

Nneka Ede ◽

...

Keyword(s):

Web Services ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Cloud Platform ◽

Human Genomics ◽

Genome Wide ◽

Innovative Methodology ◽

Amazon Web Services

Abstract Objective Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies. Methods We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail (http://doi.org/10.5281/zenodo.2646680) for analysis and exploration of genomic variants dataset. Results Comparative evaluation of numerous cloud-based cluster configurations demonstrate a successful and unprecedented compromise between speed and cost for performing genome-wide association studies on 4 distinct whole-genome sequencing datasets. Results are consistent across the 2 cloud providers and could be highly useful for accelerating research in genetics. Conclusions We present a timely piece for one of the most frequently asked questions when moving to the cloud: what is the trade-off between speed and cost?

Download Full-text

Frameshift Variant in Novel Adenosine-A1-Receptor Homolog Associated With Bovine Spastic Syndrome/Late-Onset Bovine Spastic Paresis in Holstein Sires

Frontiers in Genetics ◽

10.3389/fgene.2020.591794 ◽

2020 ◽

Vol 11 ◽

Author(s):

Frederik Krull ◽

Marc Hirschfeld ◽

Wilhelm Ewald Wemheuer ◽

Bertram Brenig

Keyword(s):

Late Onset ◽

Association Studies ◽

Adenosine A1 Receptor ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Genome Resequencing ◽

Adenosine A1 ◽

Spastic Paresis ◽

A1 Receptor ◽

Whole Genome Resequencing

Since their first description almost 100 years ago, bovine spastic paresis (BSP) and bovine spastic syndrome (BSS) are assumed to be inherited neuronal-progressive diseases in cattle. Affected animals are characterized by (frequent) spasms primarily located in the hind limbs, accompanied by severe pain symptoms and reduced vigor, thus initiating premature slaughter or euthanasia. Due to the late onset of BSP and BSS and the massively decreased lifespan of modern cattle, the importance of these diseases is underestimated. In the present study, BSP/BSS-affected German Holstein breeding sires from artificial insemination centers were collected and pedigree analysis, genome-wide association studies, whole genome resequencing, protein–protein interaction network analysis, and protein-homology modeling were performed to elucidate the genetic background. The analysis of 46 affected and 213 control cattle revealed four significantly associated positions on chromosome 15 (BTA15), i.e., AC_000172.1:g.83465449A>G (–log10P = 19.17), AC_000172.1:g.81871849C>T (–log10P = 8.31), AC_000172.1:g.81872621A>T (–log10P = 6.81), and AC_000172.1:g.81872661G>C (–log10P = 6.42). Two additional loci were significantly associated located on BTA8 and BTA19, i.e., AC_000165.1:g.71177788T>C and AC_000176.1:g.30140977T>G, respectively. Whole genome resequencing of five affected individuals and six unaffected relatives (two fathers, two mothers, a half sibling, and a full sibling) belonging to three different not directly related families was performed. After filtering, a homozygous loss of function variant was identified in the affected cattle, causing a frameshift in the so far unknown gene locus LOC100848076 encoding an adenosine-A1-receptor homolog. An allele frequency of the variant of 0.74 was determined in 3,093 samples of the 1000 Bull Genomes Project.

Download Full-text

Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data

Genome Biology ◽

10.1186/s13059-017-1216-0 ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 46

Author(s):

Yang Wu ◽

Zhili Zheng ◽

Peter M. Visscher ◽

Jian Yang

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Association Studies ◽

Genome Wide Association ◽

Whole Genome Sequencing Data ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Sequencing Data ◽

Genome Wide

Download Full-text

TAGOOS: genome-wide supervised learning of non-coding loci associated to complex phenotypes

Nucleic Acids Research ◽

10.1093/nar/gkz320 ◽

2019 ◽

Vol 47 (14) ◽

pp. e79-e79

Author(s):

Aitor González ◽

Marie Artufel ◽

Pascal Rihet

Keyword(s):

Cleft Lip ◽

Association Studies ◽

Area Under The Curve ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Functional Snps ◽

Functional Regions ◽

Complex Phenotypes ◽

Genome Wide ◽

Intergenic Regions

Abstract Genome-wide association studies (GWAS) associate single nucleotide polymorphisms (SNPs) to complex phenotypes. Most human SNPs fall in non-coding regions and are likely regulatory SNPs, but linkage disequilibrium (LD) blocks make it difficult to distinguish functional SNPs. Therefore, putative functional SNPs are usually annotated with molecular markers of gene regulatory regions and prioritized with dedicated prediction tools. We integrated associated SNPs, LD blocks and regulatory features into a supervised model called TAGOOS (TAG SNP bOOSting) and computed scores genome-wide. The TAGOOS scores enriched and prioritized unseen associated SNPs with an odds ratio of 4.3 and 3.5 and an area under the curve (AUC) of 0.65 and 0.6 for intronic and intergenic regions, respectively. The TAGOOS score was correlated with the maximal significance of associated SNPs and expression quantitative trait loci (eQTLs) and with the number of biological samples annotated for key regulatory features. Analysis of loci and regions associated to cleft lip and human adult height phenotypes recovered known functional loci and predicted new functional loci enriched in transcriptions factors related to the phenotypes. In conclusion, we trained a supervised model based on associated SNPs to prioritize putative functional regions. The TAGOOS scores, annotations and UCSC genome tracks are available here: https://tagoos.readthedocs.io.

Download Full-text

DOP54 Integrated network analysis using patient-specific single-nucleotide polymorphism profiles uncovers new pathways involved in ulcerative colitis pathogenesis

Journal of Crohn s and Colitis ◽

10.1093/ecco-jcc/jjz203.093 ◽

2020 ◽

Vol 14 (Supplement_1) ◽

pp. S092-S092

Author(s):

D Modos ◽

J Brooks ◽

P Sudhakar ◽

B Verstockt ◽

B Alexander-Dann ◽

...

Keyword(s):

Ulcerative Colitis ◽

Network Analysis ◽

Tight Junctions ◽

In Silico ◽

Target Genes ◽

Association Studies ◽

Patient Specific ◽

Genome Wide Association Studies ◽

Single Nucleotide ◽

Network Propagation

Abstract Background Genome-wide association studies have deciphered the single nucleotide polymorphisms (SNPs) which are responsible for ulcerative colitis (UC) susceptibility. However, to understand how these SNPs are involved in UC, additional methods are necessary. One such approach is in silico network propagation modelling, which can discover how the effects of SNPs in UC can affect the whole cell. A complementary approach is weighted gene co-expression network analysis (WGCNA), where co-regulated genes are identified using transcriptomic data. Integrating these two methods can shed light on how SNPs are affecting the transcriptome of UC patients. Methods We used immunochip profiles of 941 UC patients and focussed on UC-associated SNPs altering regulatory regions. Based on these regions, we identified affected genes. To understand how their corresponding proteins rewire transcriptional regulation, we predicted the path between these proteins and relevant transcription factors (TF) using the OmniPath signalling network (http://omnipathdb.org). From the TFs, we propagated the signal further to target genes using TFlink (https://tflink.net) and GTRD (http://gtrd.biouml.org). To evaluate the predicted network propagation signal, we conducted WGCNA with transcriptomics data from 46 matching patients’ (GEO ID: GSE48959). To interpret the results, we used Gene Ontology Biological Process annotations of the target genes, and we compared the function and regulation of affected genes and the determined WGCNA modules. Results We found 9 predominant signalling pathways, some already known from other studies to be involved in UC pathogenesis, including NFkB signalling, chemokine signalling, Notch pathway, JAK/STAT signalling. Downstream of these pathways we identified potential key TFs regulate the UC phenotype, for example NFKB1, GATA3, GTF2I. The targets of these TFs were enriched in the WGCNA modules of the patients. The WGCNA modules and the transcriptionally affected genes had enriched processes including cell migration, TGF-β signalling, exocytosis, adaptive T- and B-cell-specific immune responses and tight junctions. We also found myogenetic development specific TFs affected transcriptionally such as MyoD, MEF2A, MEF2D. We are currently validating these results through patient-specific biopsies. Conclusion In silico methods bring us closer to understanding UC pathogenesis. Our results suggest that in a well-defined set of patients, weakened tight junctions and insufficient immune response can lead to dysfunctional epithelial barrier, resulting in poor wound healing in UC. We hope the developed workflow will provide novel diagnostic and therapeutic options in UC.

Download Full-text

On the Threshold from Genome-Wide Association Studies to Whole-Genome Sequencing. Looking for Signal in All the Right Places

American Journal of Respiratory and Critical Care Medicine ◽

10.1164/rccm.201401-0048ed ◽

2014 ◽

Vol 189 (4) ◽

pp. 381-383 ◽

Cited By ~ 1

Author(s):

Nadia N. Hansel ◽

Rasika A. Mathias

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Genome Wide ◽

The Right

Download Full-text