Inference of multiple-wave population admixture by modeling decay of linkage disequilibrium with polynomial functions

Mapping Intimacies ◽

10.1101/082644 ◽

2016 ◽

Author(s):

Ying Zhou ◽

Kai Yuan ◽

Yaoliang Yu ◽

Xumin Ni ◽

Pengtao Xie ◽

...

Keyword(s):

Linkage Disequilibrium ◽

Simulated Data ◽

Population Admixture ◽

Multiple Wave ◽

Single Nucleotide ◽

Genome Wide ◽

Complex Population ◽

Important Challenge ◽

Source Populations ◽

Admixture Linkage Disequilibrium

AbstractTo infer the histories of population admixture, one important challenge with methods based on the admixture linkage disequilibrium (ALD) is to get rid of the effect of source LD (SLD) which is directly inherited from source populations. In previous methods, only the decay curve of weighted LD between pairs of sites whose genetic distance were larger than a certain starting distance was fitted by single or multiple exponential functions, for the inference of recent single- or multiple-wave of admixture. However, the effect of SLD has not been well defined and no tool has been developed to estimate the effect of SLD on weighted LD decay. In this study, we defined the SLD in the formularized weighted LD statistic under the two-way admixture model, and proposed polynomial spectrum (p-spectrum) to study the weighted SLD and weighted LD. We also found reference populations could be used to reduce the SLD in weighted LD statistic. We further developed a method, iMAAPs, to infer Multiple-wave Admixture by fitting ALD using Polynomial spectrum. We evaluated the performance of iMAAPs under various admixture models in simulated data and applied iMAAPs into analysis of genome-wide single nucleotide polymorphism data from the Human Genome Diversity Project (HGDP) and the HapMap Project. We showed that iMAAPs is a considerable improvement over other current methods and further facilitates the inference of the histories of complex population admixtures.

Download Full-text

Inference of multiple-wave population admixture by modeling decay of linkage disequilibrium with multiple exponential functions

10.1101/026757 ◽

2015 ◽

Cited By ~ 1

Author(s):

Ying Zhou ◽

Kai Yuan ◽

Yaoliang Yu ◽

Xumin Ni ◽

Pengtao Xie ◽

...

Keyword(s):

Linkage Disequilibrium ◽

Exponential Functions ◽

Dynamic Changes ◽

Multiple Wave ◽

Flow Parameters ◽

Genome Wide ◽

Complex Population ◽

Genome Wide Data ◽

Admixed Population ◽

Source Populations

Admixture-introduced linkage disequilibrium (LD) has recently been introduced into the inference of the histories of complex admixtures. However, the influence of ancestral source populations on the LD pattern in admixed populations is not properly taken into consideration by currently available methods, which affects the estimation of several gene flow parameters from empirical data. We first illustrated the dynamic changes of LD in admixed populations and mathematically formulated the LD under a generalized admixture model with finite population size. We next developed a new method, MALDmef, by fitting LD with multiple exponential functions for inferring and dating multiple-wave admixtures. MALDmef takes into account the effects of source populations which substantially affect modeling LD in admixed population, which renders it capable of efficiently detecting and dating multiple-wave admixture events. The performance of MALDmef was evaluated by simulation and it was shown to be more accurate than MALDER, a state-of-the-art method that was recently developed for similar purposes, under various admixture models. We further applied MALDmef to analyzing genome-wide data from the Human Genome Diversity Project (HGDP) and the HapMap Project. Interestingly, we were able to identify more than one admixture events in several populations, which have yet to be reported. For example, two major admixture events were identified in the Xinjiang Uyghur, occurring around 27???30 generations ago and 182???195 generations ago, respectively. In an African population (MKK), three recent major admixtures occurring 13???16, 50???67, and 107???139 generations ago were detected. Our method is a considerable improvement over other current methods and further facilitates the inference of the histories of complex population admixtures.

Download Full-text

Nonparametric Disequilibrium Mapping of Functional Sites Using Haplotypes of Multiple Tightly Linked Single-Nucleotide Polymorphism Markers

Genetics ◽

10.1093/genetics/164.3.1175 ◽

2003 ◽

Vol 164 (3) ◽

pp. 1175-1187

Author(s):

Rong Cheng ◽

Jennie Z Ma ◽

Fred A Wright ◽

Shili Lin ◽

Xin Gao ◽

...

Keyword(s):

Linkage Disequilibrium ◽

Simulated Data ◽

Haplotype Frequency ◽

Nucleotide Polymorphisms ◽

Data Set ◽

Single Nucleotide ◽

Functional Sites ◽

Genome Wide ◽

Snp Map ◽

Risk Of Disease

Abstract As the speed and efficiency of genotyping single-nucleotide polymorphisms (SNPs) increase, using the SNP map, it becomes possible to evaluate the extent to which a common haplotype contributes to the risk of disease. In this study we propose a new procedure for mapping functional sites or regions of a candidate gene of interest using multiple linked SNPs. Based on a case-parent trio family design, we use expectation-maximization (EM) algorithm-derived haplotype frequency estimates of multiple tightly linked SNPs from both unambiguous and ambiguous families to construct a contingency statistic S for linkage disequilibrium (LD) analysis. In the procedure, a moving-window scan for functional SNP sites or regions can cover an unlimited number of loci except for the limitation of computer storage. Within a window, all possible widths of haplotypes are utilized to find the maximum statistic S* for each site (or locus). Furthermore, this method can be applied to regional or genome-wide scanning for determining linkage disequilibrium using SNPs. The sensitivity of the proposed procedure was examined on the simulated data set from the Genetic Analysis Workshop (GAW) 12. Compared with the conventional and generalized TDT methods, our procedure is more flexible and powerful.

Download Full-text

Modeling Linkage Disequilibrium and Identifying Recombination Hotspots Using Single-Nucleotide Polymorphism Data

Genetics ◽

10.1093/genetics/165.4.2213 ◽

2003 ◽

Vol 165 (4) ◽

pp. 2213-2233 ◽

Cited By ~ 41

Author(s):

Na Li ◽

Matthew Stephens

Keyword(s):

Linkage Disequilibrium ◽

Recombination Rate ◽

Population Sample ◽

Simulated Data ◽

Region Of Interest ◽

Population Data ◽

Recombination Rates ◽

Single Nucleotide ◽

Recombination Hotspots ◽

Genomic Regions

AbstractWe introduce a new statistical model for patterns of linkage disequilibrium (LD) among multiple SNPs in a population sample. The model overcomes limitations of existing approaches to understanding, summarizing, and interpreting LD by (i) relating patterns of LD directly to the underlying recombination process; (ii) considering all loci simultaneously, rather than pairwise; (iii) avoiding the assumption that LD necessarily has a “block-like” structure; and (iv) being computationally tractable for huge genomic regions (up to complete chromosomes). We examine in detail one natural application of the model: estimation of underlying recombination rates from population data. Using simulation, we show that in the case where recombination is assumed constant across the region of interest, recombination rate estimates based on our model are competitive with the very best of current available methods. More importantly, we demonstrate, on real and simulated data, the potential of the model to help identify and quantify fine-scale variation in recombination rate from population data. We also outline how the model could be useful in other contexts, such as in the development of more efficient haplotype-based methods for LD mapping.

Download Full-text

Challenges of Adjusting Single-Nucleotide Polymorphism Effect Sizes for Linkage Disequilibrium

Human Heredity ◽

10.1159/000513303 ◽

2021 ◽

pp. 1-11

Author(s):

Valentina Escott-Price ◽

Karl Michael Schmidt

Keyword(s):

Linkage Disequilibrium ◽

Association Studies ◽

Statistical Significance ◽

Ordinary Least Squares ◽

Effect Sizes ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Single Nucleotide ◽

Genome Wide ◽

Tikhonov Regularisation

Background: Genome-wide association studies (GWAS) were successful in identifying SNPs showing association with disease, but their individual effect sizes are small and require large sample sizes to achieve statistical significance. Methods of post-GWAS analysis, including gene-based, gene-set and polygenic risk scores, combine the SNP effect sizes in an attempt to boost the power of the analyses. To avoid giving undue weight to SNPs in linkage disequilibrium (LD), the LD needs to be taken into account in these analyses. Objectives: We review methods that attempt to adjust the effect sizes (β-coefficients) of summary statistics, instead of simple LD pruning. Methods: We subject LD adjustment approaches to a mathematical analysis, recognising Tikhonov regularisation as a framework for comparison. Results: Observing the similarity of the processes involved with the more straightforward Tikhonov-regularised ordinary least squares estimate for multivariate regression coefficients, we note that current methods based on a Bayesian model for the effect sizes effectively provide an implicit choice of the regularisation parameter, which is convenient, but at the price of reduced transparency and, especially in smaller LD blocks, a risk of incomplete LD correction. Conclusions: There is no simple answer to the question which method is best, but where interpretability of the LD adjustment is essential, as in research aiming at identifying the genomic aetiology of disorders, our study suggests that a more direct choice of mild regularisation in the correction of effect sizes may be preferable.

Download Full-text

Accuracy of marker-assisted selection with single markers and marker haplotypes in cattle

Genetics Research ◽

10.1017/s0016672307008865 ◽

2007 ◽

Vol 89 (4) ◽

pp. 215-220 ◽

Cited By ~ 49

Author(s):

B. J. HAYES ◽

A. J. CHAMBERLAIN ◽

H. McPARTLAN ◽

I. MACLEOD ◽

L. SETHURAMAN ◽

...

Keyword(s):

Linkage Disequilibrium ◽

Quantitative Trait Loci ◽

Single Nucleotide Polymorphisms ◽

Quantitative Trait ◽

Marker Assisted Selection ◽

Nucleotide Polymorphisms ◽

Data Set ◽

Single Nucleotide ◽

Angus Cattle ◽

Genome Wide

SummaryA key question for the implementation of marker-assisted selection (MAS) using markers in linkage disequilibrium with quantitative trait loci (QTLs) is how many markers surrounding each QTL should be used to ensure the marker or marker haplotypes are in sufficient linkage disequilibrium (LD) with the QTL. In this paper we compare the accuracy of MAS using either single markers or marker haplotypes in an Angus cattle data set consisting of 9323 genome-wide single nucleotide polymorphisms (SNPs) genotyped in 379 Angus cattle. The extent of LD in the data set was such that the average marker–marker r2 was 0·2 at 200 kb. The accuracy of MAS increased as the number of markers in the haplotype surrounding the QTL increased, although only when the number of markers in the haplotype was 4 or greater did the accuracy exceed that achieved when the SNP in the highest LD with the QTL was used. A large number of phenotypic records (>1000) were required to accurately estimate the effects of the haplotypes.

Download Full-text

Linkage disequilibrium and within-breed genetic diversity in Iranian Zandi sheep

Archives Animal Breeding ◽

10.5194/aab-62-143-2019 ◽

2019 ◽

Vol 62 (1) ◽

pp. 143-151 ◽

Cited By ~ 6

Author(s):

Seyed Mohammad Ghoreishifar ◽

Hossein Moradi-Shahrbabak ◽

Nahid Parna ◽

Pourya Davoudi ◽

Majid Khansefid

Keyword(s):

Genetic Diversity ◽

Linkage Disequilibrium ◽

Association Studies ◽

Genome Wide Association Studies ◽

Effective Population ◽

High Genetic Diversity ◽

Single Nucleotide ◽

Genome Wide ◽

Snp Panel ◽

Zandi Sheep

Abstract. This research aimed to measure the extent of linkage disequilibrium (LD), effective population size (Ne), and runs of homozygosity (ROHs) in one of the major Iranian sheep breeds (Zandi) using 96 samples genotyped with Illumina Ovine SNP50 BeadChip. The amount of LD (r2) for single-nucleotide polymorphism (SNP) pairs in short distances (10–20 kb) was 0.21±0.25 but rapidly decreased to 0.10±0.16 by increasing the distance between SNP pairs (40–60 kb). The Ne of Zandi sheep in past (approximately 3500 generations ago) and recent (five generations ago) populations was estimated to be 6475 and 122, respectively. The ROH-based inbreeding was 0.023. We found 558 ROH regions, of which 37 % were relatively long (> 10 Mb). Compared with the rate of LD reduction in other species (e.g., cattle and pigs), in Zandi, it was reduced more rapidly by increasing the distance between SNP pairs. According to the LD pattern and high genetic diversity of Zandi sheep, we need to use an SNP panel with a higher density than Illumina Ovine SNP50 BeadChip for genomic selection and genome-wide association studies in this breed.

Download Full-text

AdmixSim 2: a forward-time simulator for modeling complex population admixture

BMC Bioinformatics ◽

10.1186/s12859-021-04415-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Rui Zhang ◽

Chang Liu ◽

Kai Yuan ◽

Xumin Ni ◽

Yuwen Pan ◽

...

Keyword(s):

Population Genomics ◽

Association Studies ◽

Simulated Data ◽

Population Admixture ◽

Simulation Tools ◽

Fisher Model ◽

Haplotype Data ◽

Complex Population ◽

Complex Scenario ◽

Local Ancestry Inference

Abstract Background Computer simulations have been widely applied in population genetics and evolutionary studies. A great deal of effort has been made over the past two decades in developing simulation tools. However, there are not many simulation tools suitable for studying population admixture. Results We here developed a forward-time simulator, AdmixSim 2, an individual-based tool that can flexibly and efficiently simulate population genomics data under complex evolutionary scenarios. Unlike its previous version, AdmixSim 2 is based on the extended Wright-Fisher model, and it implements many common evolutionary parameters to involve gene flow, natural selection, recombination, and mutation, which allow users to freely design and simulate any complex scenario involving population admixture. AdmixSim 2 can be used to simulate data of dioecious or monoecious populations, autosomes, or sex chromosomes. To our best knowledge, there are no similar tools available for the purpose of simulation of complex population admixture. Using empirical or previously simulated genomic data as input, AdmixSim 2 provides phased haplotype data for the convenience of further admixture-related analyses such as local ancestry inference, association studies, and other applications. We here evaluate the performance of AdmixSim 2 based on simulated data and validated functions via comparative analysis of simulated data and empirical data of African American, Mexican, and Uyghur populations. Conclusions AdmixSim 2 is a flexible simulation tool expected to facilitate the study of complex population admixture in various situations.

Download Full-text

Comparing Heritability Estimators under Alternative Structures of Linkage Disequilibrium

10.1101/2021.09.08.459523 ◽

2021 ◽

Author(s):

Alan Min ◽

Elizabeth Thompson ◽

Saonli Basu

Keyword(s):

Linkage Disequilibrium ◽

Fixed Effects ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Alternative Structures ◽

Genome Wide ◽

Heritability Estimation ◽

Moments Estimators ◽

The Impact ◽

Number Of Individuals

AbstractSNP heritability of a trait is the proportion of its variance explained by the additive effects of the genome-wide single nucleotide polymorphisms (SNPs). The existing approaches to estimate SNP heritability can be broadly classified into two categories. One set of approaches model the SNP effects as fixed effects and the other treats the SNP effects as random effects. These methods make certain assumptions about the dependency among individuals (familial relationship) as well as the dependency among markers (linkage disequilibrium, LD) to provide consistent estimates of SNP heritability as the number of individuals increases. While various approaches have been proposed to account for such dependencies, it remains unclear which estimates reported in the literature are more robust against various model mis-specifications. Here we investigate the impact of different structures of LD and familial relatedness on heritability estimation. We show that the performance of different methods for heritability estimation depends heavily on the structure of the underlying pattern of LD and the degree of relatedness among sampled individuals. However, contrary to the claim in the current literature, we did not find significant differences in the performance of these fixed-SNP-effects and random-SNP-effects approaches. Moreover, we established the equivalence between the two method-of-moments estimators, one from each of these two lines of approaches.

Download Full-text

EpiGEN: an epistasis simulation pipeline

Bioinformatics ◽

10.1093/bioinformatics/btaa245 ◽

2020 ◽

Vol 36 (19) ◽

pp. 4957-4959

Author(s):

David B Blumenthal ◽

Lorenzo Viola ◽

Markus List ◽

Jan Baumbach ◽

Paolo Tieri ◽

...

Keyword(s):

Arbitrary Order ◽

Association Studies ◽

Simulated Data ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Supplementary Data ◽

Single Nucleotide ◽

Genome Wide

Abstract Summary Simulated data are crucial for evaluating epistasis detection tools in genome-wide association studies. Existing simulators are limited, as they do not account for linkage disequilibrium (LD), support limited interaction models of single nucleotide polymorphisms (SNPs) and only dichotomous phenotypes or depend on proprietary software. In contrast, EpiGEN supports SNP interactions of arbitrary order, produces realistic LD patterns and generates both categorical and quantitative phenotypes. Availability and implementation EpiGEN is implemented in Python 3 and is freely available at https://github.com/baumbachlab/epigen. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Multi-locus genotyping reveals established endemicity of a geographically distinct Plasmodium vivax population in Mauritania, West Africa

10.1101/2020.09.10.291005 ◽

2020 ◽

Author(s):

Hampate Ba ◽

Sarah Auburn ◽

Christopher G. Jacob ◽

Sonia Goncalves ◽

Craig W. Duffy ◽

...

Keyword(s):

Drug Resistance ◽

Population Structure ◽

Linkage Disequilibrium ◽

West Africa ◽

Plasmodium Vivax ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide ◽

A Genome ◽

Endemic Transmission

AbstractBackgroundPlasmodium vivax has been recently discovered as a significant cause of malaria in Mauritania, although very rare elsewhere in West Africa. It has not been known if this is a recently introduced or locally remnant parasite population, nor whether the genetic structure reflects epidemic or endemic transmission.Methodology / Principal FindingsTo investigate the P. vivax population genetic structure in Mauritania and compare with populations previously analysed elsewhere, multi-locus genotyping was undertaken on 100 clinical isolates, using a genome-wide panel of 38 single nucleotide polymorphisms (SNPs), plus seven SNPs in drug resistance genes. The Mauritanian P. vivax population is shown to be genetically diverse and divergent from populations elsewhere, indicated consistently by genetic distance matrix analysis, principal components analyses, and fixation indices. Only one isolate had a genotype clearly indicating recent importation, from a southeast Asian source. There was no linkage disequilibrium in the local parasite population, and only a small number of infections appeared to be closely genetically related, indicating that there is ongoing genetic recombination consistent with endemic transmission. The P. vivax diversity in a remote mining town was similar to that in the capital Nouakchott, with no indication of local substructure or of epidemic population structure. Drug resistance alleles were virtually absent in Mauritania, in contrast with P. vivax in other areas of the world.Conclusions / SignificanceThe molecular epidemiology indicates that there is long-standing endemic transmission that will be very challenging to eliminate. The virtual absence of drug resistance alleles suggests that most infections have been untreated, and that this endemic infection has been more neglected in comparison to P. falciparum locally or to P. vivax elsewhere.Author SummaryPlasmodium vivax is a widespread cause of malaria in Mauritania, in contrast to its rarity elsewhere throughout West Africa. To investigate whether the parasite may be recently introduced or epidemic, multi-locus genotyping was performed on 100 Mauritanian P. vivax malaria cases. Analysis of a genome-wide panel of single nucleotide polymorphisms showed the P. vivax population to be genetically diverse and divergent from populations elsewhere, indicating that there has been long-standing endemic transmission. Almost all infections appear to be locally acquired, with the exception of one that was presumably imported with a genotype similar to infections seen in Southeast Asia. The Mauritanian P. vivax population shows no linkage disequilibrium, and very few infections have closely related genotypes, indicating ongoing recombination. The parasite showed no indication of local substructure or epidemic population structure. Drug resistance alleles were virtually absent, suggesting that most infections have been untreated historically. The molecular epidemiology indicates that there has been long-standing endemic transmission of this neglected parasite that requires special attention for control.

Download Full-text