scholarly journals XWAS: a software toolset for genetic data analysis and association studies of the X chromosome

2014 ◽  
Author(s):  
Feng Gao ◽  
Diana Chang ◽  
Arjun Biddanda ◽  
Li Ma ◽  
Yingjie Guo ◽  
...  

XWAS is a new software suite for the analysis of the X chromosome in association studies and similar studies. The X chromosome plays an important role in human disease, especially those with sexually dimorphic characteristics. Special attention needs to be given to its analysis due to the unique inheritance pattern, which leads to analytical complications that have resulted in the majority of genome-wide association studies (GWAS) either not considering X or mishandling it with toolsets that had been designed for non-sex chromosomes. We hence developed XWAS to fill the need for tools that are specially designed for analysis of X. Following extensive, stringent, and X-specific quality control, XWAS offers an array of statistical tests of association, including: (1) the standard test between a SNP (single nucleotide polymorphism) and disease risk, including after first stratifying individuals by sex, (2) a test for a differential effect of a SNP on disease between males and females, (3) motivated by X-inactivation, a test for higher variance of a trait in heterozygous females as compared to homozygous females, and (4) for all tests, a version that allows for combining evidence from all SNPs across a gene. We applied the toolset analysis pipeline to 16 GWAS datasets of immune-related disorders and 7 risk factors of coronary artery disease, and discovered several new X-linked genetic associations. XWAS will provide the tools and incentive for others to incorporate the X chromosome into GWAS, hence enabling discoveries of novel loci implicated in many diseases and in their sexual dimorphism.

2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S221-S221
Author(s):  
Luke C Pilling ◽  
Luigi Ferrucci ◽  
David Melzer

Abstract Thousands of loci across the genome have been identified for specific diseases in genome-wide association studies (GWAS), yet very few are associated with lifespan itself. We hypothesized that specific biological pathways transcend individual diseases and affect health and lifespan more broadly. Using the published results for the most recent GWAS for 10 key age-related diseases (including coronary artery disease, type-2 diabetes, and several cancers) we identified 22 loci with a strong genetic association with at least three of the diseases. These multi-trait aging loci include known genes affecting multiple diverse health end points, such as CDKN2A/B (9p21.3) and APOE. There are also novel multi-trait genes including SH2B3 and CASC8, likely involved in hallmark pathways of aging biology, including telomere shortening and inflammation. Several of these loci involve trade-offs between chronic disease risk and cancer.


2017 ◽  
Vol 242 (13) ◽  
pp. 1325-1334 ◽  
Author(s):  
Yizhou Zhu ◽  
Cagdas Tazearslan ◽  
Yousin Suh

Genome-wide association studies have shown that the far majority of disease-associated variants reside in the non-coding regions of the genome, suggesting that gene regulatory changes contribute to disease risk. To identify truly causal non-coding variants and their affected target genes remains challenging but is a critical step to translate the genetic associations to molecular mechanisms and ultimately clinical applications. Here we review genomic/epigenomic resources and in silico tools that can be used to identify causal non-coding variants and experimental strategies to validate their functionalities. Impact statement Most signals from genome-wide association studies (GWASs) map to the non-coding genome, and functional interpretation of these associations remained challenging. We reviewed recent progress in methodologies of studying the non-coding genome and argued that no single approach allows one to effectively identify the causal regulatory variants from GWAS results. By illustrating the advantages and limitations of each method, our review potentially provided a guideline for taking a combinatorial approach to accurately predict, prioritize, and eventually experimentally validate the causal variants.


2014 ◽  
Vol 26 (2) ◽  
pp. 567-582 ◽  
Author(s):  
Zhongxue Chen ◽  
Hon Keung Tony Ng ◽  
Jing Li ◽  
Qingzhong Liu ◽  
Hanwen Huang

In the past decade, hundreds of genome-wide association studies have been conducted to detect the significant single-nucleotide polymorphisms that are associated with certain diseases. However, most of the data from the X chromosome were not analyzed and only a few significant associated single-nucleotide polymorphisms from the X chromosome have been identified from genome-wide association studies. This is mainly due to the lack of powerful statistical tests. In this paper, we propose a novel statistical approach that combines the information of single-nucleotide polymorphisms on the X chromosome from both males and females in an efficient way. The proposed approach avoids the need of making strong assumptions about the underlying genetic models. Our proposed statistical test is a robust method that only makes the assumption that the risk allele is the same for both females and males if the single-nucleotide polymorphism is associated with the disease for both genders. Through simulation study and a real data application, we show that the proposed procedure is robust and have excellent performance compared to existing methods. We expect that many more associated single-nucleotide polymorphisms on the X chromosome will be identified if the proposed approach is applied to current available genome-wide association studies data.


2020 ◽  
Vol 5 (3) ◽  
pp. 192-201 ◽  
Author(s):  
Yuki Ishikawa ◽  
Chikashi Terao

Systemic sclerosis is an autoimmune disease characterized by generalized fibrosis in connective tissues and internal organs as consequences of microvascular dysfunction and immune dysfunctions, which leads to premature death in affected individuals. The etiology of systemic sclerosis is complex and poorly understood, but as with most autoimmune diseases, it is widely accepted that both environmental and genetic factors contribute to disease risk. During the last decade, the number of genetic markers convincingly associated with systemic sclerosis has exponentially increased. In this article, we briefly mention the genetic components of systemic sclerosis. Then, we review the classical and novel genetic associations with systemic sclerosis, analyzing the firmest and replicated signals within non–human leukocyte antigen genes, identified by both candidate gene approach and genome-wide association studies. We also provide an insight into the future perspectives that will shed more light into the complex genetic background of the disease. Despite the remarkable advance of systemic sclerosis genetics during the last decade, the use of the new genetic technologies such as next-generation sequencing, as well as the deep phenotyping of the study cohorts, to fully characterize the genetic component of this disease is imperative to identify causal variants, which leads to more targeted and effective treatment of systemic sclerosis.


2019 ◽  
Vol 39 (10) ◽  
pp. 1925-1937 ◽  
Author(s):  
Ruth McPherson

Recent studies have led to a broader understanding of the genetic architecture of coronary artery disease and demonstrate that it largely derives from the cumulative effect of multiple common risk alleles individually of small effect size rather than rare variants with large effects on coronary artery disease risk. The tools applied include genome-wide association studies encompassing over 200 000 individuals complemented by bioinformatic approaches including imputation from whole-genome data sets, expression quantitative trait loci analyses, and interrogation of ENCODE (Encyclopedia of DNA Elements), Roadmap Epigenetic Project, and other data sets. Over 160 genome-wide significant loci associated with coronary artery disease risk have been identified using the genome-wide association studies approach, 90% of which are situated in intergenic regions. Here, I will describe, in part, our research over the last decade performed in collaboration with a series of bright trainees and an extensive number of groups and individuals around the world as it applies to our understanding of the genetic basis of this complex disease. These studies include computational approaches to better understand missing heritability and identify causal pathways, experimental approaches, and progress in understanding at the molecular level the function of the multiple risk loci identified and potential applications of these genomic data in clinical medicine and drug discovery.


BMC Medicine ◽  
2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Maxime M. Bos ◽  
Neil J. Goulding ◽  
Matthew A. Lee ◽  
Amy Hofman ◽  
Mariska Bot ◽  
...  

Abstract Background Sleep traits are associated with cardiometabolic disease risk, with evidence from Mendelian randomization (MR) suggesting that insomnia symptoms and shorter sleep duration increase coronary artery disease risk. We combined adjusted multivariable regression (AMV) and MR analyses of phenotypes of unfavourable sleep on 113 metabolomic traits to investigate possible biochemical mechanisms linking sleep to cardiovascular disease. Methods We used AMV (N = 17,368) combined with two-sample MR (N = 38,618) to examine effects of self-reported insomnia symptoms, total habitual sleep duration, and chronotype on 113 metabolomic traits. The AMV analyses were conducted on data from 10 cohorts of mostly Europeans, adjusted for age, sex, and body mass index. For the MR analyses, we used summary results from published European-ancestry genome-wide association studies of self-reported sleep traits and of nuclear magnetic resonance (NMR) serum metabolites. We used the inverse-variance weighted (IVW) method and complemented this with sensitivity analyses to assess MR assumptions. Results We found consistent evidence from AMV and MR analyses for associations of usual vs. sometimes/rare/never insomnia symptoms with lower citrate (− 0.08 standard deviation (SD)[95% confidence interval (CI) − 0.12, − 0.03] in AMV and − 0.03SD [− 0.07, − 0.003] in MR), higher glycoprotein acetyls (0.08SD [95% CI 0.03, 0.12] in AMV and 0.06SD [0.03, 0.10) in MR]), lower total very large HDL particles (− 0.04SD [− 0.08, 0.00] in AMV and − 0.05SD [− 0.09, − 0.02] in MR), and lower phospholipids in very large HDL particles (− 0.04SD [− 0.08, 0.002] in AMV and − 0.05SD [− 0.08, − 0.02] in MR). Longer total sleep duration associated with higher creatinine concentrations using both methods (0.02SD per 1 h [0.01, 0.03] in AMV and 0.15SD [0.02, 0.29] in MR) and with isoleucine in MR analyses (0.22SD [0.08, 0.35]). No consistent evidence was observed for effects of chronotype on metabolomic measures. Conclusions Whilst our results suggested that unfavourable sleep traits may not cause widespread metabolic disruption, some notable effects were observed. The evidence for possible effects of insomnia symptoms on glycoprotein acetyls and citrate and longer total sleep duration on creatinine and isoleucine might explain some of the effects, found in MR analyses of these sleep traits on coronary heart disease, which warrant further investigation.


2019 ◽  
Author(s):  
Najla Saad Elhezzani ◽  
Wicher Bergsma ◽  
Mike Weale

AbstractMost genome-wide association studies (GWASs) use randomly selected samples from the population (hereafter bases) as the control set. This approach is successful when the trait of interest is rare; otherwise, a loss in the statistical power to detect disease-associated variants is expected. To address this, a proposal to combine the three sample types, cases, controls and bases is introduced, for instances when the disease under study is prevalent. This is done by modelling the bases as a mixture of multinomial logistic functions of cases and controls, according to the disease prevalence. The maximum likelihood method is used to estimate the underlying parameters using the EM algorithm. Three classical tests of association; score, Walds, and likelihood ratio tests are derived and their power of detecting genetic associations under different designs is compared. Simulations show that combining the three samples can increase the power to detect disease-associated variants, though a very large base sample set can compensate for the lack of controls.


2020 ◽  
Author(s):  
Maxime M Bos ◽  
Neil J Goulding ◽  
Matthew A Lee ◽  
Amy Hofman ◽  
Mariska Bot ◽  
...  

Background: Sleep traits are associated with cardiometabolic disease risk, with evidence from Mendelian randomization (MR) suggesting that insomnia symptoms and shorter sleep duration increase coronary artery disease risk. We combined adjusted multivariable regression (AMV) and MR analyses of phenotypes of unfavourable sleep on 113 metabolomic traits to investigate possible biochemical mechanisms linking sleep to cardiovascular disease. Methods: We used AMV (N=17,370) combined with two-sample MR (N=38,618) to examine effects of self-reported insomnia symptoms, total habitual sleep duration, and chronotype on 113 metabolomic traits. The AMV analyses were conducted on data from 10 cohorts of mostly Europeans, adjusted for age, sex and body mass index. For the MR analyses, we used summary results from published European-ancestry genome-wide association studies of self-reported sleep traits and of nuclear magnetic resonance (NMR) serum metabolites. We used the inverse-variance weighted (IVW) method and complemented this with sensitivity analyses to assess MR assumptions. Results: We found consistent evidence from AMV and MR analyses for associations of usual vs. sometimes/rare/never insomnia symptoms with lower citrate (-0.08 standard deviation (SD)[95% confidence interval (CI): -0.12, -0.03] in AMV and -0.03SD [-0.07, -0.003] in MR), higher glycoprotein acetyls (0.08SD [95%CI: 0.03, 0.12] in AMV and 0.06SD [0.03, 0.10) in MR]), lower total very large HDL particles (-0.04SD [-0.08, 0.00] in AMV and -0.05SD [-0.09, -0.02] in MR) and lower phospholipids in very large HDL particles (-0.04SD [-0.08, 0.002] in AMV and -0.05SD [-0.08, -0.02] in MR). Longer total sleep duration associated with higher creatinine concentrations using both methods (0.02SD per 1-hour [0.01, 0.03] in AMV and 0.15SD [0.02, 0.29] in MR) and with isoleucine in MR analyses (0.22SD [0.08, 0.35]). No consistent evidence was observed for effects of chronotype on metabolomic measures. Conclusions: Whilst our results suggested that unfavourable sleep traits may not cause widespread metabolic disruption, some notable effects were observed. The evidence for possible effects of insomnia symptoms on glycoprotein acetyls and citrate and longer total sleep duration on creatinine and isoleucine might explain some of the effects, found in MR analyses of these sleep traits on coronary heart disease, which warrant further investigation.


2020 ◽  
Vol 49 (4) ◽  
pp. 1246-1256
Author(s):  
Inge Verkouter ◽  
Renée de Mutsert ◽  
Roelof A J Smit ◽  
Stella Trompet ◽  
Frits R Rosendaal ◽  
...  

Abstract Background Body mass index (BMI)-associated loci are used to explore the effects of obesity using Mendelian randomization (MR), but the contribution of individual tissues to risks remains unknown. We aimed to identify tissue-grouped pathways of BMI-associated loci and relate these to cardiometabolic disease using MR analyses. Methods Using Genotype-Tissue Expression (GTEx) data, we performed overrepresentation tests to identify tissue-grouped gene sets based on mRNA-expression profiles from 634 previously published BMI-associated loci. We conducted two-sample MR with inverse-variance-weighted methods, to examine associations between tissue-grouped BMI-associated genetic instruments and type 2 diabetes mellitus (T2DM) and coronary artery disease (CAD), with use of summary-level data from published genome-wide association studies (T2DM: 74 124 cases, 824 006 controls; CAD: 60 801 cases, 123 504 controls). Additionally, we performed MR analyses on T2DM and CAD using randomly sampled sets of 100 or 200 BMI-associated genetic variants. Results We identified 17 partly overlapping tissue-grouped gene sets, of which 12 were brain areas, where BMI-associated genes were differentially expressed. In tissue-grouped MR analyses, all gene sets were similarly associated with increased risks of T2DM and CAD. MR analyses with randomly sampled genetic variants on T2DM and CAD resulted in a distribution of effect estimates similar to tissue-grouped gene sets. Conclusions Overrepresentation tests revealed differential expression of BMI-associated genes in 17 different tissues. However, with our biology-based approach using tissue-grouped MR analyses, we did not identify different risks of T2DM or CAD for the BMI-associated gene sets, which was reflected by similar effect estimates obtained by randomly sampled gene sets.


Sign in / Sign up

Export Citation Format

Share Document