A novel privacy-preserving federated genome-wide association study framework and its application in identifying potential risk variants in ankylosing spondylitis

Author(s):  
Xin Wu ◽  
Hao Zheng ◽  
Zuochao Dou ◽  
Feng Chen ◽  
Jieren Deng ◽  
...  

Abstract Genome-wide association studies (GWAS) have been widely used for identifying potential risk variants in various diseases. A statistically meaningful GWAS typically requires a large sample size to detect disease-associated single nucleotide polymorphisms (SNPs). However, a single institution usually only possesses a limited number of samples. Therefore, cross-institutional partnerships are required to increase sample size and statistical power. However, cross-institutional partnerships offer significant challenges, a major one being data privacy. For example, the privacy awareness of people, the impact of data privacy leakages and the privacy-related risks are becoming increasingly important, while there is no de-identification standard available to safeguard genomic data sharing. In this paper, we introduce a novel privacy-preserving federated GWAS framework (iPRIVATES). Equipped with privacy-preserving federated analysis, iPRIVATES enables multiple institutions to jointly perform GWAS analysis without leaking patient-level genotyping data. Only aggregated local statistics are exchanged within the study network. In addition, we evaluate the performance of iPRIVATES through both simulated data and a real-world application for identifying potential risk variants in ankylosing spondylitis (AS). The experimental results showed that the strongest signal of AS-associated SNPs reside mostly around the human leukocyte antigen (HLA) regions. The proposed iPRIVATES framework achieved equivalent results as traditional centralized implementation, demonstrating its great potential in driving collaborative genomic research for different diseases while preserving data privacy.

Author(s):  
Zachary F Gerring ◽  
Angela Mina-Vargas ◽  
Eric R Gamazon ◽  
Eske M Derks

Abstract Motivation Genome-wide association studies have successfully identified multiple independent genetic loci that harbour variants associated with human traits and diseases, but the exact causal genes are largely unknown. Common genetic risk variants are enriched in non-protein-coding regions of the genome and often affect gene expression (expression quantitative trait loci, eQTL) in a tissue-specific manner. To address this challenge, we developed a methodological framework, E-MAGMA, which converts genome-wide association summary statistics into gene-level statistics by assigning risk variants to their putative genes based on tissue-specific eQTL information. Results We compared E-MAGMA to three eQTL informed gene-based approaches using simulated phenotype data. Phenotypes were simulated based on eQTL reference data using GCTA for all genes with at least one eQTL at chromosome 1. We performed 10 simulations per gene. The eQTL-h2 (i.e., the proportion of variation explained by the eQTLs) was set at 1%, 2%, and 5%. We found E-MAGMA outperforms other gene-based approaches across a range of simulated parameters (e.g. the number of identified causal genes). When applied to genome-wide association summary statistics for five neuropsychiatric disorders, E-MAGMA identified more putative candidate causal genes compared to other eQTL-based approaches. By integrating tissue-specific eQTL information, these results show E-MAGMA will help to identify novel candidate causal genes from genome-wide association summary statistics and thereby improve the understanding of the biological basis of complex disorders. Availability A tutorial and input files are made available in a github repository: https://github.com/eskederks/eMAGMA-tutorial. Supplementary information Supplementary data are available at Bioinformatics online.


Brain ◽  
2020 ◽  
Author(s):  
Longfei Jia ◽  
Fangyu Li ◽  
Cuibai Wei ◽  
Min Zhu ◽  
Qiumin Qu ◽  
...  

Abstract Previous genome-wide association studies have identified dozens of susceptibility loci for sporadic Alzheimer’s disease, but few of these loci have been validated in longitudinal cohorts. Establishing predictive models of Alzheimer’s disease based on these novel variants is clinically important for verifying whether they have pathological functions and provide a useful tool for screening of disease risk. In the current study, we performed a two-stage genome-wide association study of 3913 patients with Alzheimer’s disease and 7593 controls and identified four novel variants (rs3777215, rs6859823, rs234434, and rs2255835; Pcombined = 3.07 × 10−19, 2.49 × 10−23, 1.35 × 10−67, and 4.81 × 10−9, respectively) as well as nine variants in the apolipoprotein E region with genome-wide significance (P < 5.0 × 10−8). Literature mining suggested that these novel single nucleotide polymorphisms are related to amyloid precursor protein transport and metabolism, antioxidation, and neurogenesis. Based on their possible roles in the development of Alzheimer’s disease, we used different combinations of these variants and the apolipoprotein E status and successively built 11 predictive models. The predictive models include relatively few single nucleotide polymorphisms useful for clinical practice, in which the maximum number was 13 and the minimum was only four. These predictive models were all significant and their peak of area under the curve reached 0.73 both in the first and second stages. Finally, these models were validated using a separate longitudinal cohort of 5474 individuals. The results showed that individuals carrying risk variants included in the models had a shorter latency and higher incidence of Alzheimer’s disease, suggesting that our models can predict Alzheimer’s disease onset in a population with genetic susceptibility. The effectiveness of the models for predicting Alzheimer’s disease onset confirmed the contributions of these identified variants to disease pathogenesis. In conclusion, this is the first study to validate genome-wide association study-based predictive models for evaluating the risk of Alzheimer’s disease onset in a large Chinese population. The clinical application of these models will be beneficial for individuals harbouring these risk variants, and particularly for young individuals seeking genetic consultation.


Sign in / Sign up

Export Citation Format

Share Document