scholarly journals A fast and agnostic method for bacterial genome-wide association studies: bridging the gap between kmers and genetic events

2018 ◽  
Author(s):  
Magali Jaillard ◽  
Leandro Lima ◽  
Maud Tournoud ◽  
Pierre Mahé ◽  
Alex van Belkum ◽  
...  

AbstractMotivationGenome-wide association study (GWAS) methods applied to bacterial genomes have shown promising results for genetic marker discovery or fine-assessment of marker effect. Recently, alignment-free methods based on kmer composition have proven their ability to explore the accessory genome. However, they lead to redundant descriptions and results which are hard to interpret.MethodsHere, we introduce DBGWAS, an extended kmer-based GWAS method producing interpretable genetic variants associated with pheno-types. Relying on compacted De Bruijn graphs (cDBG), our method gathers cDBG nodes identified by the association model into subgraphs defined from their neighbourhood in the initial cDBG. DBGWAS is fast, alignment-free and only requires a set of contigs and phenotypes. It produces annotated subgraphs representing local polymorphisms as well as mobile genetic elements (MGE) and offers a graphical framework to interpret GWAS results.ResultsWe validated our method using antibiotic resistance phenotypes for three bacterial species. DBGWAS recovered known resistance determinants such as mutations in core genes in Mycobacterium tuberculosis and genes acquired by horizontal transfer in Staphylococcus aureus and Pseudomonas aeruginosa – along with their MGE context. It also enabled us to formulate new hypotheses involving genetic variants not yet described in the antibiotic resistance literature.ConclusionOur novel method proved its efficiency to retrieve any type of phenotype-associated genetic variant without prior knowledge. All experiments were computed in less than two hours and produced a compact set of meaningful subgraphs, thereby outperforming other GWAS approaches and facilitating the interpretation of the results.AvailabilityOpen-source tool available at https://gitlab.com/leoisl/dbgwas

2021 ◽  
Vol 12 ◽  
Author(s):  
Robert E. Weber ◽  
Stephan Fuchs ◽  
Franziska Layer ◽  
Anna Sommer ◽  
Jennifer K. Bender ◽  
...  

BackgroundAs next generation sequencing (NGS) technologies have experienced a rapid development over the last decade, the investigation of the bacterial genetic architecture reveals a high potential to dissect causal loci of antibiotic resistance phenotypes. Although genome-wide association studies (GWAS) have been successfully applied for investigating the basis of resistance traits, complex resistance phenotypes have been omitted so far. For S. aureus this especially refers to antibiotics of last resort like daptomycin and ceftaroline. Therefore, we aimed to perform GWAS for the identification of genetic variants associated with DAP and CPT resistance in clinical S. aureus isolates.Materials/methodsTo conduct microbial GWAS, we selected cases and controls according to their clonal background, date of isolation, and geographical origin. Association testing was performed with PLINK and SEER analysis. By using in silico analysis, we also searched for rare genetic variants in candidate loci that have previously been described to be involved in the development of corresponding resistance phenotypes.ResultsGWAS revealed MprF P314L and L826F to be significantly associated with DAP resistance. These mutations were found to be homogenously distributed among clonal lineages suggesting convergent evolution. Additionally, rare and yet undescribed single nucleotide polymorphisms could be identified within mprF and putative candidate genes. Finally, we could show that each DAP resistant isolate exhibited at least one amino acid substitution within the open reading frame of mprF. Due to the presence of strong population stratification, no genetic variants could be associated with CPT resistance. However, the investigation of the staphylococcal cassette chromosome mec (SCCmec) revealed various mecA SNPs to be putatively linked with CPT resistance. Additionally, some CPT resistant isolates revealed no mecA mutations, supporting the hypothesis that further and still unknown resistance determinants are crucial for the development of CPT resistance in S. aureus.ConclusionWe hereby confirmed the potential of GWAS to identify genetic variants that are associated with antibiotic resistance traits in S. aureus. However, precautions need to be taken to prevent the detection of spurious associations. In addition, the implementation of different approaches is still essential to detect multiple forms of variations and mutations that occur with a low frequency.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Shuquan Rao ◽  
Yao Yao ◽  
Daniel E. Bauer

AbstractGenome-wide association studies (GWAS) have uncovered thousands of genetic variants that influence risk for human diseases and traits. Yet understanding the mechanisms by which these genetic variants, mainly noncoding, have an impact on associated diseases and traits remains a significant hurdle. In this review, we discuss emerging experimental approaches that are being applied for functional studies of causal variants and translational advances from GWAS findings to disease prevention and treatment. We highlight the use of genome editing technologies in GWAS functional studies to modify genomic sequences, with proof-of-principle examples. We discuss the challenges in interrogating causal variants, points for consideration in experimental design and interpretation of GWAS locus mechanisms, and the potential for novel therapeutic opportunities. With the accumulation of knowledge of functional genetics, therapeutic genome editing based on GWAS discoveries will become increasingly feasible.


Author(s):  
Jianhua Wang ◽  
Dandan Huang ◽  
Yao Zhou ◽  
Hongcheng Yao ◽  
Huanhuan Liu ◽  
...  

Abstract Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.


2011 ◽  
Vol 40 (D1) ◽  
pp. D1047-D1054 ◽  
Author(s):  
Mulin Jun Li ◽  
Panwen Wang ◽  
Xiaorong Liu ◽  
Ee Lyn Lim ◽  
Zhangyong Wang ◽  
...  

2015 ◽  
Vol 44 (D1) ◽  
pp. D869-D876 ◽  
Author(s):  
Mulin Jun Li ◽  
Zipeng Liu ◽  
Panwen Wang ◽  
Maria P. Wong ◽  
Matthew R. Nelson ◽  
...  

Author(s):  
Yun Li ◽  
George T. O’Connor ◽  
Josée Dupuis ◽  
Eric Kolaczyk

AbstractIn genome-wide association studies (GWAS), it is of interest to identify genetic variants associated with phenotypes. For a given phenotype, the associated genetic variants are usually a sparse subset of all possible variants. Traditional Lasso-type estimation methods can therefore be used to detect important genes. But the relationship between genotypes at one variant and a phenotype may be influenced by other variables, such as sex and life style. Hence it is important to be able to incorporate gene-covariate interactions into the sparse regression model. In addition, because there is biological knowledge on the manner in which genes work together in structured groups, it is desirable to incorporate this information as well. In this paper, we present a novel sparse regression methodology for gene-covariate models in association studies that not only allows such interactions but also considers biological group structure. Simulation results show that our method substantially outperforms another method, in which interaction is considered, but group structure is ignored. Application to data on total plasma immunoglobulin E (IgE) concentrations in the Framingham Heart Study (FHS), using sex and smoking status as covariates, yields several potentially interesting gene-covariate interactions.


2019 ◽  
Vol 40 (Supplement_1) ◽  
Author(s):  
M Oguri ◽  
K Kato ◽  
H Horibe ◽  
T Fujimaki ◽  
J Sakuma ◽  
...  

Abstract Background Early-onset coronary artery disease (CAD) has a strong genetic component. Although genome-wide association studies have identified various genes and loci significantly associated with CAD mainly in European ancestry populations, genetic variants that contribute to susceptibility to this condition in Japanese individuals remain to be identified definitively. Purpose The purpose of the study was to identify genetic variants that confer susceptibility to early-onset CAD in Japanese. We have now performed exome-wide association studies (EWASs) in subjects with early-onset CAD and controls. Methods A total of 7256 individuals aged ≤65 years was enrolled in the study. The EWAS was conducted with 1482 subjects with CAD and 5774 controls. Genotyping of single nucleotide polymorphisms (SNPs) was performed with Illumina Human Exome-12 DNA Analysis BeadChip or Infinium Exome-24 BeadChip arrays. The relation of allele frequencies for 31,465 SNPs that passed quality control to CAD was examined with Fisher's exact test. To compensate for multiple comparisons of allele frequencies with CAD, we applied a false discovery rate (FDR) of <0.05 for statistical significance of association. Results The relation of allele frequencies for 31,465 SNPs to CAD with the use of Fisher's exact test showed that 170 SNPs were significantly (FDR <0.05) associated with CAD. Multivariable logistic regression analysis with adjustment for age, sex, and the prevalence of hypertension, diabetes mellitus, and dyslipidemia revealed that 162 SNPs were significantly (P<0.05) related to CAD. A stepwise forward selection procedure was performed to examine the effects of genotypes for the 162 SNPs on CAD. The 54 SNPs were significant (P<0.05) and independent [coefficient of determination (R2), 0.0008 to 0.0297] determinants of CAD. These SNPs together accounted for 15.5% of the cause of CAD. After examination of results from previous genome-wide association studies and linkage disequilibrium of the identified SNPs, we newly identified 21 genes (RNF2, YEATS2, USP45, ITGB8, TNS3, FAM170B-AS1, PRKG1, BTRC, MKI67, STIM1, OR52E4, KIAA1551, MON2, PLUT, LINC00354, TRPM1, ADAT1, KRT27, LIPE, GFY, EIF3L) and five chromosomal regions (2p13, 4q31.2, 5q12, 13q34, 20q13.2) that were significantly associated with CAD. Gene ontology analysis showed that various biological functions were predicted in the 18 genes identified in the present study. The network analysis revealed that the 18 genes had potential direct or indirect interactions with the 30 genes previously shown to be associated with CAD or with the 228 genes identified in previous genome-wide association studies of CAD. Conclusion We have newly identified 26 loci that confer susceptibility to CAD. Determination of genotypes for the SNPs at these loci may prove informative for assessment of the genetic risk for CAD in Japanese.


2019 ◽  
Vol 2019 ◽  
pp. 1-6 ◽  
Author(s):  
Kuo-Hsuan Chang ◽  
Chiung-Mei Chen ◽  
Yi-Chun Chen ◽  
Hon-Chung Fung ◽  
Yih-Ru Wu

Previous genome-wide association studies in Caucasian populations suggest that genetic loci in amino acid catabolism may be associated with Parkinson’s disease (PD). However, these genetic disease associations were limitedly reported in Asian populations. Herein, we investigated the effect of top three PD-associated genetic variants related to amino acid catabolism in Caucasians listed on the top risk loci identified by meta-analysis of genome-wide association studies in PDGene database, including aminocarboxymuconate-semialdehyde decarboxylase- (ACMSD-) transmembrane protein 163 (TMEM163) rs6430538, methylcrotonyl-CoA carboxylase 1 (MCCC1) rs12637471, and branched-chain ketoacid dehydrogenase kinase- (BCKDK-) syntaxin 1B (STX1B) rs14235, by genotyping 599 Taiwanese patients with PD and 598 age-matched control subjects. PD patients demonstrate similar allelic and genotypic frequencies in all tested genetic variants. These ethnic discrepancies of genetic variants suggest a distinct genetic background of amino acid catabolism between Taiwanese and Caucasian PD patients.


2017 ◽  
Vol 56 ◽  
pp. 92-98 ◽  
Author(s):  
Jagadesan Sankarasubramanian ◽  
Udayakumar S. Vishnu ◽  
Paramasamy Gunasekaran ◽  
Jeyaprakash Rajendhran

2011 ◽  
Vol 64 (6) ◽  
pp. 509-514
Author(s):  
Osmel Companioni ◽  
Francisco Rodríguez Esparragón ◽  
Alfonso Medina Fernández-Aceituno ◽  
José Carlos Rodríguez Pérez

Sign in / Sign up

Export Citation Format

Share Document