scholarly journals Review: Population Structure in Genetic Studies: Confounding Factors and Mixed Models

2016 ◽  
Author(s):  
Lana S. Martin ◽  
Eleazar Eskin

AbstractA genome-wide association study (GWAS) seeks to identify genetic variants that contribute to the development and progression of a specific disease. Over the past 10 years, new approaches using mixed models have emerged to mitigate the deleterious effects of population structure and relatedness in association studies. However, developing GWAS techniques to effectively test for association while correcting for population structure is a computational and statistical challenge. Using laboratory mouse strains as an example, our review characterizes the problem of population structure in association studies and describes how it can cause false positive associations. We then motivate mixed models in the context of unmodeled factors.

2013 ◽  
Vol 2013 ◽  
pp. 1-12 ◽  
Author(s):  
Zoi Zagoriti ◽  
Manousos E. Kambouris ◽  
George P. Patrinos ◽  
Socrates J. Tzartos ◽  
Konstantinos Poulas

Myasthenia gravis (MG) is an autoimmune disease mediated by the presence of autoantibodies that bind to components of the neuromuscular junction, causing the symptoms of muscular weakness and fatigability. Like most autoimmune disorders, MG is a multifactorial, noninherited disease, though with an established genetic constituent. The heterogeneity observed in MG perplexes genetic analysis even more, as it occurs in various levels, including diverse autoantigens, thymus histopathology, and age at onset. In this context of distinct subgroups, a plethora of association studies, discussed in this review, have assessed the involvement of various HLA and non-HLA related loci in MG susceptibility, over the past five years. As expected, certain HLA alleles were strongly associated with MG. Many of the non-HLA genes, such asPTPN22andCTLA-4, have been previously studied in MG and other autoimmune diseases and their association with MG has been reevaluated in more cohesive groups of patients. Moreover, novel risk or protective loci have been revealed, as in the case ofTNIP1andFOXP3. Although the majority of these results have been derived from candidate gene studies, the focal point of all recent genetic studies is the first genome-wide association study (GWAS) conducted on early-onset MG patients.


2019 ◽  
Vol 22 (8) ◽  
pp. 1063-1069 ◽  
Author(s):  
N. S. Yudin ◽  
N. L. Podkolodnyy ◽  
T. A. Agarkova ◽  
E. V. Ignatieva

Selection by means of genetic markers is a promising approach to the eradication of infectious diseases in farm animals, especially in the absence of effective methods of treatment and prevention. Bovine leukemia virus (BLV) is spread throughout the world and represents one of the biggest problems for the livestock production and food security in Russia. However, recent genome-wide association studies have shown that sensitivity/resistance to BLV is polygenic. The aim of this study was to create a catalog of cattle genes and genes of other mammalian species involved in the pathogenesis of BLV-induced infection and to perform gene prioritization using bioinformatics methods. Based on manually collected information from a range of open sources, a total of 446 genes were included in the catalog of cattle genes and genes of other mammals involved in the pathogenesis of BLV-induced infection. The following criteria were used to prioritize 446 genes from the catalog: (1) the gene is associated with leukemia according to a genome-wide association study; (2) the gene is associated with leukemia according to a case-control study; (3) the role of the gene in leukemia development has been studied using knockout mice; (4) protein-protein interactions exist between the gene-encoded protein and either viral particles or individual viral proteins; (5) the gene is annotated with Gene Ontology terms that are overrepresented for a given list of genes; (6) the gene participates in biological pathways from the KEGG or REACTOME databases, which are over-represented for a given list of genes; (7) the protein encoded by the gene has a high number of protein-protein interactions with proteins encoded by other genes from the catalog. Based on each criterion, a rank was assigned to each gene. Then the ranks were summarized and an overall rank was determined. Prioritization of 446 candidate genes allowed us to identify 5 genes of interest (TNF,LTB,BOLA-DQA1,BOLA-DRB3,ATF2), which can affect the sensitivity/resistance of cattle to leukemia.


2018 ◽  
Author(s):  
Matthew P. Conomos ◽  
Alex P. Reiner ◽  
Mary Sara McPeek ◽  
Timothy A. Thornton

AbstractLinear mixed models (LMMs) have become the standard approach for genetic association testing in the presence of sample structure. However, the performance of LMMs has primarily been evaluated in relatively homogeneous populations of European ancestry, despite many of the recent genetic association studies including samples from worldwide populations with diverse ancestries. In this paper, we demonstrate that existing LMM methods can have systematic miscalibration of association test statistics genome-wide in samples with heterogenous ancestry, resulting in both increased type-I error rates and a loss of power. Furthermore, we show that this miscalibration arises due to varying allele frequency differences across the genome among populations. To overcome this problem, we developed LMM-OPS, an LMM approach which orthogonally partitions diverse genetic structure into two components: distant population structure and recent genetic relatedness. In simulation studies with real and simulated genotype data, we demonstrate that LMM-OPS is appropriately calibrated in the presence of ancestry heterogeneity and outperforms existing LMM approaches, including EMMAX, GCTA, and GEMMA. We conduct a GWAS of white blood cell (WBC) count in an admixed sample of 3,551 Hispanic/Latino American women from the Women’s Health Initiative SNP Health Association Resource where LMM-OPS detects genome-wide significant associations with corresponding p-values that are one or more orders of magnitude smaller than those from competing LMM methods. We also identify a genome-wide significant association with regulatory variant rs2814778 in the DARC gene on chromosome 1, which generalizes to Hispanic/Latino Americans a previous association with reduced WBC count identified in African Americans.


2015 ◽  
Author(s):  
Liya Wang ◽  
Peter Van Buren ◽  
Doreen Ware

Over the past few years, cloud-based platforms have been proposed to address storage, management, and computation of large-scale data, especially in the field of genomics. However, for collaboration efforts involving multiple institutes, data transfer and management, interoperability and standardization among different platforms have imposed new challenges. This paper proposes a distributed bioinformatics platform that can leverage local clusters with remote computational clusters for genomic analysis using the unified bioinformatics workflow. The platform is built with a data server configured with iRODS, a computation cluster authenticated with iPlant Agave system, and web server to interact with the platform. A Genome-Wide Association Study workflow is integrated to validate the feasibility of the proposed approach.


2018 ◽  
Author(s):  
Bernadette C Young ◽  
Sarah G Earle ◽  
Sona Soeng ◽  
Poda Sar ◽  
Varun Kumar ◽  
...  

AbstractPyomyositis is a severe bacterial infection of skeletal muscle, commonly affecting children in tropical regions and predominantly caused by Staphylococcus aureus. To understand the contribution of bacterial genomic factors to pyomyositis, we conducted a genome-wide association study of S. aureus cultured from 101 children with pyomyositis and 417 children with asymptomatic nasal carriage attending the Angkor Hospital for Children in Cambodia. We found a strong relationship between bacterial genetic variation and pyomyositis, with estimated heritability 63.8% (95% CI 49.2-78.4%). The presence of the Panton-Valentine leucocidin (PVL) locus increased the odds of pyomyositis 130-fold (p =10-17.9). The signal of association mapped both to the PVL-coding sequence and the sequence immediately upstream. Together these regions explained > 99.9% of heritability. Our results establish staphylococcal pyomyositis, like tetanus and diphtheria, as critically dependent on expression of a single toxin and demonstrate the potential for association studies to identify specific bacterial genes promoting severe human disease.


2021 ◽  
Author(s):  
Eun Pyo Hong ◽  
Dong Hyuk Youn ◽  
Bong Jun Kim ◽  
Jun Hyong Ahn ◽  
Jeong Jin Park ◽  
...  

Abstract In addition to conventional genome-wide association studies (GWAS), a fine-mapping is increasingly used to identify the genetic function of variants associated with disease susceptibilities. Here, we used a fine-mapping approach to evaluate the casual variants based on a previous GWAS involving patients with intracranial aneurysm (IA). Fine-mapping analysis was conducted based on the chromosomal data provided by GWAS consisting 250 patients diagnosed with IA and 296 controls using posterior inclusion probability (PIP) and log10 transformed Bayes factor (log10BF). The narrow sense of heritability (h2) explained by each casual variant was estimated. Subsequent gene expression and functional network analyses were used to calculate the transcripts per million (TPM) values. Twenty causal candidate single nucleotide polymorphisms (SNPs) surpassed a genome-wide significance threshold for creditable evidence (log10BF > 6.1). Four SNPs including rs75822236 (R535H, GBA; log10BF = 15.06), rs112859779 (G141S, TCF24; log10BF = 12.12), rs79134766 (A208T, OLFML2A; log10BF = 14.92), and rs371331393 (Q1932X, ARHGAP32; log10BF = 20.88) showed a completed PIP value in each chromosomal region, suggesting a high probability of variant causality associated with IA. Expression in GBA was highly enriched in the whole blood (TPM = 33.13), while TCF24 were rarely expressed in all tissues and cells. No direct interaction was observed between the four casual genes; however, PSAP appeared to be particularly important via indirect correlation between other genes. Our results suggested that four mutations of GBA, TCF24, OLFML2A, and ARHGAP32 were linked to IA susceptibility and pathogenesis. Our approach may promise more informative mutations in the following GWAS.


2020 ◽  
Author(s):  
Gregory Vogel ◽  
Michael A. Gore ◽  
Christine D. Smart

AbstractPhytophthora capsici is a soilborne oomycete plant pathogen that causes severe vegetable crop losses in New York (NY) State and worldwide. This pathogen is difficult to manage, in part due to its production of long-lasting sexual spores and its tendency to quickly evolve fungicide resistance. We single-nucleotide polymorphism (SNP) genotyped 252 P. capsici isolates, predominantly from NY, in order to conduct a genome-wide association study for mating type and mefenoxam insensitivity. The population structure and extent of chromosomal copy number variation in this collection of isolates were also characterized. Population structure analyses showed isolates largely clustered by the field site where they were collected, with values of FST between pairs of fields ranging from 0.10 to 0.31. Thirty-three isolates were putative aneuploids, demonstrating evidence for having up to four linkage groups present in more than two copies, and an additional two isolates appeared to be genome-wide triploids. Mating type was mapped to a region on scaffold 4, consistent with previous findings, and mefenoxam insensitivity was associated with several SNP markers at a novel locus on scaffold 62. We identified several candidate genes for mefenoxam sensitivity, including a homolog of yeast ribosome synthesis factor Rrp5, but failed to locate near the scaffold 62 locus any subunits of RNA Polymerase I, the enzyme that has been hypothesized to be the target site of phenylamide fungicides in oomycetes. This work expands our knowledge of the population biology of P. capsici and provides a foundation for functional validation of candidate genes associated with epidemiologically important phenotypes.


eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Bernadette C Young ◽  
Sarah G Earle ◽  
Sona Soeng ◽  
Poda Sar ◽  
Varun Kumar ◽  
...  

Pyomyositis is a severe bacterial infection of skeletal muscle, commonly affecting children in tropical regions, predominantly caused by Staphylococcus aureus. To understand the contribution of bacterial genomic factors to pyomyositis, we conducted a genome-wide association study of S. aureus cultured from 101 children with pyomyositis and 417 children with asymptomatic nasal carriage attending the Angkor Hospital for Children, Cambodia. We found a strong relationship between bacterial genetic variation and pyomyositis, with estimated heritability 63.8% (95% CI 49.2–78.4%). The presence of the Panton–Valentine leucocidin (PVL) locus increased the odds of pyomyositis 130-fold (p=10-17.9). The signal of association mapped both to the PVL-coding sequence and to the sequence immediately upstream. Together these regions explained over 99.9% of heritability (95% CI 93.5–100%). Our results establish staphylococcal pyomyositis, like tetanus and diphtheria, as critically dependent on a single toxin and demonstrate the potential for association studies to identify specific bacterial genes promoting severe human disease.


Sign in / Sign up

Export Citation Format

Share Document