pyseer: a comprehensive tool for microbial pangenome-wide association studies

Mapping Intimacies ◽

10.1101/266312 ◽

2018 ◽

Cited By ~ 1

Author(s):

John A Lees ◽

Marco Galardini ◽

Stephen D Bentley ◽

Jeffrey N Weiser ◽

Jukka Corander

Keyword(s):

Input Data ◽

Association Studies ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Supplementary Data ◽

New Methods ◽

Link Type ◽

Genome Wide

AbstractSummaryGenome-wide association studies (GWAS) in microbes face different challenges to eukaryotes and have been addressed by a number of different methods. pyseer brings these techniques together in one package tailored to microbial GWAS, allows greater flexibility of the input data used, and adds new methods to interpret the association results.Availability and Implementationpyseer is written in python and is freely available at https://github.com/mgalardini/pyseer, or can be installed through pip. Documentation and a tutorial are available at http://[email protected] and [email protected] informationSupplementary data are available online.

Download Full-text

EpiGEN: an epistasis simulation pipeline

Bioinformatics ◽

10.1093/bioinformatics/btaa245 ◽

2020 ◽

Vol 36 (19) ◽

pp. 4957-4959

Author(s):

David B Blumenthal ◽

Lorenzo Viola ◽

Markus List ◽

Jan Baumbach ◽

Paolo Tieri ◽

...

Keyword(s):

Arbitrary Order ◽

Association Studies ◽

Simulated Data ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Supplementary Data ◽

Single Nucleotide ◽

Genome Wide

Abstract Summary Simulated data are crucial for evaluating epistasis detection tools in genome-wide association studies. Existing simulators are limited, as they do not account for linkage disequilibrium (LD), support limited interaction models of single nucleotide polymorphisms (SNPs) and only dichotomous phenotypes or depend on proprietary software. In contrast, EpiGEN supports SNP interactions of arbitrary order, produces realistic LD patterns and generates both categorical and quantitative phenotypes. Availability and implementation EpiGEN is implemented in Python 3 and is freely available at https://github.com/baumbachlab/epigen. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

GWASpro: a high-performance genome-wide association analysis server

Bioinformatics ◽

10.1093/bioinformatics/bty989 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2512-2514 ◽

Cited By ~ 4

Author(s):

Bongsong Kim ◽

Xinbin Dai ◽

Wenchao Zhang ◽

Zhaohong Zhuang ◽

Darlene L Sanchez ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

Linear Mixed Model ◽

Association Studies ◽

Learning Curves ◽

Experimental Designs ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Genome Wide

Abstract Summary We present GWASpro, a high-performance web server for the analyses of large-scale genome-wide association studies (GWAS). GWASpro was developed to provide data analyses for large-scale molecular genetic data, coupled with complex replicated experimental designs such as found in plant science investigations and to overcome the steep learning curves of existing GWAS software tools. GWASpro supports building complex design matrices, by which complex experimental designs that may include replications, treatments, locations and times, can be accounted for in the linear mixed model. GWASpro is optimized to handle GWAS data that may consist of up to 10 million markers and 10 000 samples from replicable lines or hybrids. GWASpro provides an interface that significantly reduces the learning curve for new GWAS investigators. Availability and implementation GWASpro is freely available at https://bioinfo.noble.org/GWASPRO. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

bGWAS: an R package to perform Bayesian genome wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btaa549 ◽

2020 ◽

Vol 36 (15) ◽

pp. 4374-4376

Author(s):

Ninon Mounier ◽

Zoltán Kutalik

Keyword(s):

Mendelian Randomization ◽

Causal Effect ◽

Association Studies ◽

R Package ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Biological Mechanisms ◽

Genome Wide ◽

Related Risk

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Genome-wide association study identifies novel type II diabetes risk loci in Jordan subpopulations

PeerJ ◽

10.7717/peerj.3618 ◽

2017 ◽

Vol 5 ◽

pp. e3618 ◽

Cited By ~ 4

Author(s):

Rana Dajani ◽

Jin Li ◽

Zhi Wei ◽

Michael E. March ◽

Qianghua Xia ◽

...

Keyword(s):

Type Ii Diabetes ◽

Genome Wide Association Study ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Type Ii ◽

Arab Population ◽

Public Health Burden ◽

Link Type ◽

Genome Wide

The prevalence of Type II Diabetes (T2D) has been increasing and has become a disease of significant public health burden in Jordan. None of the previous genome-wide association studies (GWAS) have specifically investigated the Middle East populations. The Circassian and Chechen communities in Jordan represent unique populations that are genetically distinct from the Arab population and other populations in the Caucasus. Prevalence of T2D is very high in both the Circassian and Chechen communities in Jordan despite low obesity prevalence. We conducted GWAS on T2D in these two populations and further performed meta-analysis of the results. We identified a novel T2D locus at chr20p12.2 at genome-wide significance (rs6134031, P = 1.12 × 10−8) and we replicated the results in the Wellcome Trust Case Control Consortium (WTCCC) dataset. Another locus at chr12q24.31 is associated with T2D at suggestive significance level (top SNP rs4758690, P = 4.20 × 10−5) and it is a robust eQTL for the gene, MLXIP (P = 1.10 × 10−14), and is significantly associated with methylation level in MLXIP, the functions of which involves cellular glucose response. Therefore, in this first GWAS of T2D in Jordan subpopulations, we identified novel and unique susceptibility loci which may help inform the genetic underpinnings of T2D in other populations.

Download Full-text

Integration of genome-wide association studies and gene coexpression networks unveils promising soybean resistance genes against five common fungal pathogens

Scientific Reports ◽

10.1038/s41598-021-03864-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Fabricio Almeida-Silva ◽

Thiago M. Venancio

Keyword(s):

Candidate Genes ◽

Resistance Genes ◽

Association Studies ◽

Fungal Species ◽

Genome Wide Association ◽

Economic Losses ◽

Genome Wide Association Studies ◽

Link Type ◽

Physical Defense ◽

Genome Wide

AbstractSoybean is one of the most important legume crops worldwide. However, soybean yield is dramatically affected by fungal diseases, leading to economic losses of billions of dollars yearly. Here, we integrated publicly available genome-wide association studies and transcriptomic data to prioritize candidate genes associated with resistance to Cadophora gregata, Fusarium graminearum, Fusarium virguliforme, Macrophomina phaseolina, and Phakopsora pachyrhizi. We identified 188, 56, 11, 8, and 3 high-confidence candidates for resistance to F. virguliforme, F. graminearum, C. gregata, M. phaseolina and P. pachyrhizi, respectively. The prioritized candidate genes are highly conserved in the pangenome of cultivated soybeans and are heavily biased towards fungal species-specific defense responses. The vast majority of the prioritized candidate resistance genes are related to plant immunity processes, such as recognition, signaling, oxidative stress, systemic acquired resistance, and physical defense. Based on the number of resistance alleles, we selected the five most resistant accessions against each fungal species in the soybean USDA germplasm. Interestingly, the most resistant accessions do not reach the maximum theoretical resistance potential. Hence, they can be further improved to increase resistance in breeding programs or through genetic engineering. Finally, the coexpression network generated here is available in a user-friendly web application (https://soyfungigcn.venanciogroup.uenf.br/) and an R/Shiny package (https://github.com/almeidasilvaf/SoyFungiGCN) that serve as a public resource to explore soybean-pathogenic fungi interactions at the transcriptional level.

Download Full-text

PopCluster: an algorithm to identify genetic variants with ethnicity-dependent effects

Bioinformatics ◽

10.1093/bioinformatics/btz017 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3046-3054 ◽

Cited By ~ 2

Author(s):

Anastasia Gurinovich ◽

Harold Bae ◽

John J Farrell ◽

Stacy L Andersen ◽

Stefano Monti ◽

...

Keyword(s):

Genetic Variants ◽

Association Studies ◽

False Positive Rate ◽

Principal Component ◽

True Positive Rate ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Positive Rate

Abstract Motivation Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery. Results In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects’ ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype. Availability and implementation PopCluster was implemented using the R programming language, PLINK and Eigensoft software, and can be found at the following GitHub repository: https://github.com/gurinovich/PopCluster with instructions on its installation and usage. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Efficient multivariate analysis algorithms for longitudinal genome-wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btz304 ◽

2019 ◽

Vol 35 (23) ◽

pp. 4879-4885 ◽

Cited By ~ 4

Author(s):

Chao Ning ◽

Dan Wang ◽

Lei Zhou ◽

Julong Wei ◽

Yuanxin Liu ◽

...

Keyword(s):

Longitudinal Data ◽

Software Package ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Computational Speed

Abstract Motivation Current dynamic phenotyping system introduces time as an extra dimension to genome-wide association studies (GWAS), which helps to explore the mechanism of dynamical genetic control for complex longitudinal traits. However, existing methods for longitudinal GWAS either ignore the covariance among observations of different time points or encounter computational efficiency issues. Results We herein developed efficient genome-wide multivariate association algorithms for longitudinal data. In contrast to existing univariate linear mixed model analyses, the proposed method has improved statistic power for association detection and computational speed. In addition, the new method can analyze unbalanced longitudinal data with thousands of individuals and more than ten thousand records within a few hours. The corresponding time for balanced longitudinal data is just a few minutes. Availability and implementation A software package to implement the efficient algorithm named GMA (https://github.com/chaoning/GMA) is available freely for interested users in relevant fields. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MultiMeta: an R package for meta-analysing multi-phenotype genome-wide association studies

10.1101/013920 ◽

2015 ◽

Author(s):

Dragana Vuckovic ◽

Paolo Gasparini ◽

Nicole Soranzo ◽

Valentina Iotchkova

Keyword(s):

Multivariate Analysis ◽

Association Studies ◽

Meta Analysis ◽

R Package ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

New Methods ◽

Genome Wide ◽

Inverse Variance

Summary: As new methods for multivariate analysis of Genome Wide Association Studies (GWAS) become available, it is important to be able to combine results from different cohorts in a meta-analysis. The R package MultiMeta provides an implementation of the inverse-variance based method for meta-analysis, generalized to an n-dimensional setting. Availability: The R package MultiMeta can be downloaded from CRAN Contact: [email protected]

Download Full-text

Assessing the performance of genome-wide association studies for predicting disease risk

10.1101/701086 ◽

2019 ◽

Author(s):

Jonas Patron ◽

Arnau Serra-Cayuela ◽

Beomsoo Han ◽

Carin Li ◽

David Scott Wishart

Keyword(s):

Disease Risk ◽

Association Studies ◽

Roc Curves ◽

Gwas Data ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Link Type ◽

Genome Wide ◽

Risk Predictors ◽

Gwa Studies

AbstractTo date more than 3700 genome-wide association studies (GWAS) have been published that look at the genetic contributions of single nucleotide polymorphisms (SNPs) to human conditions or human phenotypes. Through these studies many highly significant SNPs have been identified for hundreds of diseases or medical conditions. However, the extent to which GWAS-identified SNPs or combinations of SNP biomarkers can predict disease risk is not well known. One of the most commonly used approaches to assess the performance of predictive biomarkers is to determine the area under the receiver-operator characteristic curve (AUROC). We have developed an R package called G-WIZ to generate ROC curves and calculate the AUROC using summary-level GWAS data. We first tested the performance of G-WIZ by using AUROC values derived from patient-level SNP data, as well as literature-reported AUROC values. We found that G-WIZ predicts the AUROC with <3% error. Next, we used the summary level GWAS data from GWAS Central to determine the ROC curves and AUROC values for 569 different GWA studies spanning 219 different conditions. Using these data we found a small number of GWA studies with SNP-derived risk predictors that have very high AUROCs (>0.75). On the other hand, the average GWA study produces a multi-SNP risk predictor with an AUROC of 0.55. Detailed AUROC comparisons indicate that most SNP-derived risk predictions are not as good as clinically based disease risk predictors. All our calculations (ROC curves, AUROCs, explained heritability) are in a publicly accessible database called GWAS-ROCS (http://gwasrocs.ca). The G-WIZ code is freely available for download at https://github.com/jonaspatronjp/GWIZ-Rscript/.

Download Full-text

Mixed Logistic Regression in Genome-Wide Association Studies

10.1101/2020.01.17.910109 ◽

2020 ◽

Author(s):

Jacqueline Milet ◽

Hervé Perdry

Keyword(s):

Logistic Regression ◽

Linear Models ◽

Association Studies ◽

Score Test ◽

R Package ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Mixed Linear Models ◽

Genome Wide

AbstractMotivationMixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved that this method is inappropriate and proposed a score test for the mixed logistic regression (MLR). However this test does not allow an estimation of the variants’ effects.ResultsWe propose two computationally efficient methods to estimate the variants’ effects. Their properties are evaluated on two simulations sets, and compared with other methods (MLM, logistic regression). MLR performs the best in all circumstances. The variants’ effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p-values inflation or deflation, when population strata are not clearly identified in the sample.AvailabilityAll methods are implemented in the R package milorGWAS available at https://github.com/genostats/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text