scholarly journals Flexible Mixture Model Approaches That Accommodate Footprint Size Variability for Robust Detection of Balancing Selection

2020 ◽  
Vol 37 (11) ◽  
pp. 3267-3291 ◽  
Author(s):  
Xiaoheng Cheng ◽  
Michael DeGiorgio

Abstract Long-term balancing selection typically leaves narrow footprints of increased genetic diversity, and therefore most detection approaches only achieve optimal performances when sufficiently small genomic regions (i.e., windows) are examined. Such methods are sensitive to window sizes and suffer substantial losses in power when windows are large. Here, we employ mixture models to construct a set of five composite likelihood ratio test statistics, which we collectively term B statistics. These statistics are agnostic to window sizes and can operate on diverse forms of input data. Through simulations, we show that they exhibit comparable power to the best-performing current methods, and retain substantially high power regardless of window sizes. They also display considerable robustness to high mutation rates and uneven recombination landscapes, as well as an array of other common confounding scenarios. Moreover, we applied a specific version of the B statistics, termed B2, to a human population-genomic data set and recovered many top candidates from prior studies, including the then-uncharacterized STPG2 and CCDC169–SOHLH2, both of which are related to gamete functions. We further applied B2 on a bonobo population-genomic data set. In addition to the MHC-DQ genes, we uncovered several novel candidate genes, such as KLRD1, involved in viral defense, and SCN9A, associated with pain perception. Finally, we show that our methods can be extended to account for multiallelic balancing selection and integrated the set of statistics into open-source software named BalLeRMix for future applications by the scientific community.

2019 ◽  
Author(s):  
Xiaoheng Cheng ◽  
Michael DeGiorgio

AbstractLong-term balancing selection typically leaves narrow footprints of increased genetic diversity, and therefore most detection approaches only achieve optimal performances when sufficiently small genomic regions (i.e., windows) are examined. Such methods are sensitive to window sizes and suffer substantial losses in power when windows are large. This issue creates a tradeoff between noise and power in empirical applications. Here, we employ mixture models to construct a set of five composite likelihood ratio test statistics, which we collectively termBstatistics. These statistics are agnostic to window sizes and can operate on diverse forms of input data. Through simulations, we show that they exhibit comparable power to the best-performing current methods, and retain substantially high power regardless of window sizes. They also display considerable robustness to high mutation rates and uneven recombination landscapes, as well as an array of other common confounding scenarios. Moreover, we applied a specific version of theBstatistics, termedB2, to a human population-genomic dataset and recovered many top candidates from prior studies, including the then-uncharacterizedSTPG2andCCDC169-SOHLH2, both of which are related to gamete functions. We further appliedB2on a bonobo population-genomic dataset. In addition to theMHC-DQgenes, we uncovered several novel candidate genes, such asKLRD1, involved in viral defense, andSCN9A, associated with pain perception. Finally, we show that our methods can be extended to account for multi-allelic balancing selection, and integrated the set of statistics into open-source software namedBalLeRMixfor future applications by the scientific community.


Author(s):  
Vivak Soni ◽  
Michiel Vos ◽  
Adam Eyre-Walker

AbstractThe role that balancing selection plays in the maintenance of genetic diversity remains unresolved. Here we introduce a new test, based on the McDonald-Kreitman test, in which the number of polymorphisms that are shared between populations is contrasted to those that are private at selected and neutral sites. We show that this simple test is robust to a variety of demographic changes, and that it can also give a direct estimate of the number of shared polymorphisms that are directly maintained by balancing selection. We apply our method to population genomic data from humans and conclude that more than a thousand non-synonymous polymorphisms are subject to balancing selection.


2021 ◽  
Author(s):  
Cooper Alastair Grace ◽  
Sarah Forrester ◽  
Vladimir Costa Silva ◽  
Aleksander Aare ◽  
Hannah Kilford ◽  
...  

AbstractThe Leishmania donovani species complex are the causative agents of visceral leishmaniasis, which cause 20-40,000 fatalities a year. Here, we conduct a screen for balancing selection in this specie complex. We sequence 93 isolates of L. infantum from Brazil and used 387 publicly-available L. donovani and L. infantum genomes, to describe the global diversity of this species complex. We identify five genetically-distinct populations that are sufficiently represented by genomic data to search for signatures of selection. We show that multiple metrics identify genes with robust signatures of balancing selection. We produce a curated set of 19 genes with robust signatures, including zeta toxin, nodulin-like and flagellum attachment proteins. Candidate genes were generally not shared between populations, consistent with divergent rather than long-term balancing selection in these species. This study highlights the extent of genetic divergence between L. donovani complex parasites and provides candidate genes for further study.


2020 ◽  
Author(s):  
Xi Wang ◽  
Pär K Ingvarsson

AbstractDetecting natural selection is one of the major goals of evolutionary genomics. Here, we sequence whole genomes of 34 Picea abies individuals and quantify the amount of selection across the genome. Using an estimate of the distribution of fitness effects, we show that negative selection is very limited in coding regions, while positive selection is rare in coding regions but very strong in non-coding regions, suggesting the great importance of regulatory changes in evolution of Norway spruce. Additionally, we found a positive correlation between adaptive rate with recombination rate and a negative correlation between adaptive rate and gene density, suggesting a widespread influence from Hill-Robertson interference to efficiency of protein adaptation in P. abies. Finally, the distinct population statistics between genomic regions under either positive or balancing selection with that under neutral regions indicated impact from selection to genomic architecture of Norway spruce. Further gene ontology enrichment analysis for genes located in regions identified as undergoing either positive or long-term balancing selection also highlighted specific molecular functions and biological processes in that appear to be targets of selection in Norway spruce.


2013 ◽  
Author(s):  
Yaniv Brandvain ◽  
Amanda M Kenney ◽  
Lex Fagel ◽  
Graham Coop ◽  
Andrea L Sweigart

Mimulus guttatus and M. nasutus are an evolutionary and ecological model sister species pair differentiated by ecology, mating system, and partial reproductive isolation. Despite extensive research on this system, the history of divergence and differentiation in this sister pair is unclear. We present and analyze a novel population genomic data set which shows that M. nasutus "budded" off of a central Californian M. guttatus population within the last 200 to 500 thousand years. In this time, the M. nasutus genome has accrued numerous genomic signatures of the transition to predominant selfing. Despite clear biological differentiation, we document ongoing, bidirectional introgression. We observe a negative relationship between the recombination rate and divergence between M. nasutus and sympatric M. guttatus samples, suggesting that selection acts against M. nasutus ancestry in M. guttatus.


2021 ◽  
Author(s):  
Arne Sahm ◽  
Philipp Koch ◽  
Steve Horvath ◽  
Steve Hoffmann

While the investigation of the epigenome becomes increasingly important, still little is known about the long-term evolution of epigenetic marks and systematic investigation strategies are still withstanding. Here, we systematically demonstrate the transfer of classic phylogenetic methods such as maximum likelihood based on substitution models, parsimony, and distance-based to interval-scaled epigenetic data (available at Github). Using a great apes blood data set, we demonstrate that DNA methylation is evolutionarily conserved at the level of individual CpGs in promotors, enhancers and genic regions. Our analysis also reveals that this epigenomic conservation is significantly correlated with its transcription factor binding density. Binding sites for transcription factors involved in neuron differentiation and components of AP-1 evolve at a significantly higher rate at methylation than at nucleotide level. Moreover, our models suggest an accelerated epigenomic evolution at binding sites of BRCA1, CBX2, and factors of the polycomb repressor 2 complex in humans. For most genomic regions, the methylation-based reconstruction of phylogenetic trees is at par with sequence-based reconstruction. Most strikingly, phylogenetic reconstruction using methylation rates in enhancer regions was ineffective independently of the chosen model. We identify a set of phylogenetically uninformative CpG sites enriching in enhancers controlling immune-related genes.


2013 ◽  
Vol 31 (15_suppl) ◽  
pp. 506-506 ◽  
Author(s):  
Michael Gnant ◽  
Mitchell Dowsett ◽  
Martin Filipits ◽  
Elena Lopez-Knowles ◽  
Richard Greil ◽  
...  

506 Background: Most postmenopausal women with node positive HR+ EBC receive adjuvant chemotherapy. We hypothesized that a molecular-based characterization of residual risk after endocrine therapy using the ROR score and IS may identify node-positive patient subgroups with limited long-term recurrence risk after endocrine therapy better than clinical-pathological risk assessment by clinical treatment score (CTS) alone. Methods: Long-term follow-up and tissue samples were obtained from 2,485 postmenopausal HR+ patients from the ABCSG-8 (N=1,478) and transATAC (N=1,007) trials. The PAM50 test was conducted on RNA extracted from paraffin blocks using the NanoString nCounter Analysis system. The ability of ROR, IS and ROR-defined risk groups (ROR-RG) to add prognostic information to CTS was assessed by the likelihood ratio test in a prospectively defined analysis plan. Results: Patients in the combined data set were grouped by the number of positive nodes into 1 (N1), 2 (N2), or 2 or 3 (N2-3),Baseline hazards for these subgroups were similar in the two trials. ROR score, IS and ROR-RG added statistically significant prognostic information (10-year distant recurrence risk) beyond CTS in all groups. In patients with one positive node, the absolute 10-year risk of distant recurrence was 6.6% [95% CI: 3.3%-12.8%] in the PAM-50-low risk group (40% of patients) and 8.4 % [5.3%-13.3%] in the Luminal A subgroup (69% of patients). Conclusions: The results of this combined analysis demonstrate that a significant proportion of N1 EBC patients have very limited long term recurrence risk and suggest the same for some N2 patients. The PAM50 ROR score, IS and ROR-RG reliably provide additional prognostic information beyond CTS and may be useful in deciding which women with node-positive HR+ EBC can be spared adjuvant chemotherapy. [Table: see text]


2018 ◽  
Author(s):  
Xiaoheng Cheng ◽  
Michael DeGiorgio

AbstractTrans-species polymorphism has been widely used as a key sign of long-term balancing selection across multiple species. However, such sites are often rare in the genome, and could result from mutational processes or technical artifacts. Few methods are yet available to specifically detect footprints of trans-species balancing selection without using trans-species polymorphic sites. In this study, we develop summary- and model-based approaches that are each specifically tailored to uncover regions of long-term balancing selection shared by a set of species by using genomic patterns of intra-specific polymorphism and inter-specific fixed differences. We demonstrate that our trans-species statistics have substantially higher power than single-species approaches to detect footprints of trans-species balancing selection, and are robust to those that do not affect all tested species. We further apply our model-based methods to human and chimpanzee whole genome sequencing data. In addition to the previously-established MHC and malaria resistance-associated FREM3/GYPE regions, we also find outstanding genomic regions involved in barrier integrity and innate immunity, such as the GRIK1/CLDN17 intergenic region, and the SLC35F1 and ABCA13 genes. Our findings not only echo the significance of pathogen defense, but also reveal novel candidates in maintaining balanced polymorphisms across human and chimpanzee lineages. Finally, we show that these trans-species statistics can be applied to and work well for an arbitrary number of species, and integrate them into open-source software packages for ease of use by the scientific community.


Sign in / Sign up

Export Citation Format

Share Document