scholarly journals A likelihood approach for uncovering selective sweep signatures from haplotype data

2019 ◽  
Author(s):  
Alexandre M. Harris ◽  
Michael DeGiorgio

AbstractSelective sweeps are frequent and varied signatures in the genomes of natural populations, and detecting them is consequently important in understanding mechanisms of adaptation by natural selection. Following a selective sweep, haplotypic diversity surrounding the site under selection decreases, and this deviation from the background pattern of variation can be applied to identify sweeps. Multiple methods exist to locate selective sweeps in the genome from haplotype data, but none leverage the power of a model-based approach to make their inference. Here, we propose a likelihood ratio test statistic T to probe whole genome polymorphism datasets for selective sweep signatures. Our framework uses a simple but powerful model of haplotype frequency spectrum distortion to find sweeps and additionally make an inference on the number of presently sweeping haplotypes in a population. We found that the T statistic is suitable for detecting both hard and soft sweeps across a variety of demographic models, selection strengths, and ages of the beneficial allele. Accordingly, we applied the T statistic to variant calls from European and sub-Saharan African human populations, yielding primarily literature-supported candidates, including LCT, RSPH3, and ZNF211 in CEU, SYT1, RGS18, and NNT in YRI, and HLA genes in both populations. We also searched for sweep signatures in Drosophila melanogaster, finding expected candidates at Ace, Uhg1, and Pimet. Finally, we provide open-source software to compute the T statistic and the inferred number of presently sweeping haplotypes from whole-genome data.

2020 ◽  
Vol 37 (10) ◽  
pp. 3023-3046
Author(s):  
Alexandre M Harris ◽  
Michael DeGiorgio

Abstract Selective sweeps are frequent and varied signatures in the genomes of natural populations, and detecting them is consequently important in understanding mechanisms of adaptation by natural selection. Following a selective sweep, haplotypic diversity surrounding the site under selection decreases, and this deviation from the background pattern of variation can be applied to identify sweeps. Multiple methods exist to locate selective sweeps in the genome from haplotype data, but none leverages the power of a model-based approach to make their inference. Here, we propose a likelihood ratio test statistic T to probe whole-genome polymorphism data sets for selective sweep signatures. Our framework uses a simple but powerful model of haplotype frequency spectrum distortion to find sweeps and additionally make an inference on the number of presently sweeping haplotypes in a population. We found that the T statistic is suitable for detecting both hard and soft sweeps across a variety of demographic models, selection strengths, and ages of the beneficial allele. Accordingly, we applied the T statistic to variant calls from European and sub-Saharan African human populations, yielding primarily literature-supported candidates, including LCT, RSPH3, and ZNF211 in CEU, SYT1, RGS18, and NNT in YRI, and HLA genes in both populations. We also searched for sweep signatures in Drosophila melanogaster, finding expected candidates at Ace, Uhg1, and Pimet. Finally, we provide open-source software to compute the T statistic and the inferred number of presently sweeping haplotypes from whole-genome data.


Genetics ◽  
2002 ◽  
Vol 160 (2) ◽  
pp. 753-763 ◽  
Author(s):  
Christian Schlötterer

AbstractWith the availability of completely sequenced genomes, multilocus scans of natural variability have become a feasible approach for the identification of genomic regions subjected to natural and artificial selection. Here, I introduce a new multilocus test statistic, ln RV, which is based on the ratio of observed variances in repeat number at a set of microsatellite loci in two groups of populations. The distribution of ln RV values captures demographic history of the populations as well as variation in microsatellite mutation among loci. Given that microsatellite loci associated with a recent selective sweep differ from the remainder of the genome, they are expected to fall outside of the distribution of neutral ln RV values. The ln RV test statistic is applied to a data set of 94 loci typed in eight non-African and two African human populations.


2021 ◽  
Author(s):  
Pavitra Muralidhar ◽  
Carl Veller

AbstractGenetic models of adaptation to a new environment have typically assumed that the alleles involved maintain a constant fitness dominance across the old and new environments. However, theories of dominance suggest that this should often not be the case. Instead, the alleles involved should frequently shift from recessive deleterious in the old environment to dominant beneficial in the new environment. Here, we study the consequences of these expected dominance shifts for the genetics of adaptation to a new environment. We find that dominance shifts increase the likelihood that adaptation occurs from the standing variation, and that multiple alleles from the standing variation are involved (a soft selective sweep). Furthermore, we find that expected dominance shifts increase the haplotypic diversity of selective sweeps, rendering soft sweeps more detectable in small genomic samples. In cases where an environmental change threatens the viability of the population, we show that expected dominance shifts of newly beneficial alleles increase the likelihood of evolutionary rescue and the number of alleles involved. Finally, we apply our results to a well-studied case of adaptation to a new environment: the evolution of pesticide resistance at the Ace locus in Drosophila melanogaster. We show that, under reasonable demographic assumptions, the expected dominance shift of resistant alleles causes soft sweeps to be the most frequent outcome in this case, with the primary source of these soft sweeps being the standing variation at the onset of pesticide use, rather than recurrent mutation thereafter.


2013 ◽  
Vol 45 (15) ◽  
pp. 667-683 ◽  
Author(s):  
Jessica H. Geahlen ◽  
Carlo Lapid ◽  
Kaisa Thorell ◽  
Igor Nikolskiy ◽  
Won Jae Huh ◽  
...  

In a screen for genes expressed specifically in gastric mucous neck cells, we identified GKN3, the recently discovered third member of the gastrokine family. We present confirmatory mouse data and novel porcine data showing that mouse GKN3 expression is confined to mucous cells of the corpus neck and antrum base and is prominently expressed in metaplastic lesions. GKN3 was proposed originally to be expressed in some human populations and a pseudogene in others. To investigate that hypothesis, we studied human GKN3 evolution in the context of its paralogous genomic neighbors, GKN1 and GKN2. Haplotype analysis revealed that GKN3 mimics GKN2 in patterns of exonic SNP allocation, whereas GKN1 appeared to be more stringently selected. GKN3 showed signatures of both directional selection and population based selective sweeps in humans. One such selective sweep includes SNP rs10187256, originally identified as an ancestral tryptophan to premature STOP codon mutation. The derived (nonancestral) allele went to fixation in Asia. We show that another SNP, rs75578132, identified 5 bp downstream of rs10187256, exhibits a second selective sweep in almost all Europeans, some Latinos, and some Africans, possibly resulting from a reintroduction of European genes during African colonization. Finally, we identify a mutation that would destroy the splice donor site in the putative exon3-intron3 boundary, which occurs in all human genomes examined to date. Our results highlight a stomach-specific human genetic locus, which has undergone various selective sweeps across European, Asian, and African populations and thus reflects geographic and ethnic patterns in genome evolution.


2017 ◽  
Author(s):  
Ali Akbari ◽  
Joseph J. Vitti ◽  
Arya Iranmehr ◽  
Mehrdad Bakhtiari ◽  
Pardis C. Sabeti ◽  
...  

AbstractMethods to identify signatures of selective sweeps in population genomics data have been actively developed, but mostly do not identify the specific mutation favored by the selective sweep. We present a method, iSAFE, that uses a statistic derived solely from population genetics signals to pinpoint the favored mutation even when the signature of selection extends to 5Mbp. iSAFE was tested extensively on simulated data and in human populations from the 1000 Genomes Project, at 22 loci with previously characterized selective sweeps. For 14 of the 22 loci, iSAFE ranked the previously characterized candidate mutation among the 13 highest scoring (out of ∼ 21, 000 variants). Three loci did not show a strong signal. For the remaining loci, iSAFE identified previously unreported mutations as being favored. In these regions, all of which involve pigmentation related genes, iSAFE identified identical selected mutations in multiple non-African populations suggesting an out-of-Africa onset of selection. The iSAFE software can be downloaded from https://github.com/alek0991/iSAFE.


2015 ◽  
Author(s):  
Daniel R. Schrider ◽  
Andrew D. Kern

ABSTRACTDetecting the targets of adaptive natural selection from whole genome sequencing data is a central problem for population genetics. However, to date most methods have shown sub-optimal performance under realistic demographic scenarios. Moreover, over the past decade there has been a renewed interest in determining the importance of selection from standing variation in adaptation of natural populations, yet very few methods for inferring this model of adaptation at the genome scale have been introduced. Here we introduce a new method, S/HIC, which uses supervised machine learning to precisely infer the location of both hard and soft selective sweeps. We show that S/HIC has unrivaled accuracy for detecting sweeps under demographic histories that are relevant to human populations, and distinguishing sweeps from linked as well as neutrally evolving regions. Moreover we show that S/HIC is uniquely robust among its competitors to model misspecification. Thus even if the true demographic model of a population differs catastrophically from that specified by the user, S/HIC still retains impressive discriminatory power. Finally we apply S/HIC to the case of resequencing data from human chromosome 18 in a European population sample and demonstrate that we can reliably recover selective sweeps that have been identified earlier using less specific and sensitive methods.


2015 ◽  
Author(s):  
Roy Ronen ◽  
Glenn Tesler ◽  
Ali Akbari ◽  
Shay Zakov ◽  
Noah A Rosenberg ◽  
...  

Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory — for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.


Sign in / Sign up

Export Citation Format

Share Document