Multispecies coalescent analysis unravels the non-monophyly and controversial relationships of Hexapoda

dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms

10.7287/peerj.preprints.314 ◽

2014 ◽

Author(s):

Jonathan Puritz ◽

Christopher M. Hollenbeck ◽

John R. Gold

Keyword(s):

Population Genomics ◽

De Novo ◽

Variant Calling ◽

Population Level ◽

Model Organisms ◽

Effective Population ◽

Reduction Techniques ◽

Indel Polymorphisms ◽

Indel Calling ◽

Population Sizes

Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for organisms with large effective population sizes and high levels of genetic polymorphism but for which no genomic resources exist. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is most likely due to the fact that dDocent quality trims instead of filtering and incorporates both forward and reverse reads in assembly, mapping, and SNP calling, thus enabling use of reads with Indel polymorphisms. The pipeline and a comprehensive user guide can be found at (http://dDocent.wordpress.com).

Download Full-text

Deep-Time Demographic Inference Suggests Ecological Release as Driver of Neoavian Adaptive Radiation

Diversity ◽

10.3390/d12040164 ◽

2020 ◽

Vol 12 (4) ◽

pp. 164 ◽

Cited By ~ 2

Author(s):

Peter Houde ◽

Edward L. Braun ◽

Lawrence Zhou

Keyword(s):

Evolutionary Biology ◽

Gene Tree ◽

Difficult Problem ◽

Gene Trees ◽

Effective Population ◽

Deep Time ◽

Ecological Release ◽

Population Sizes ◽

Demographic Inference ◽

Gene Tree Discordance

Assessing the applicability of theory to major adaptive radiations in deep time represents an extremely difficult problem in evolutionary biology. Neoaves, which includes 95% of living birds, is believed to have undergone a period of rapid diversification roughly coincident with the Cretaceous–Paleogene (K-Pg) boundary. We investigate whether basal neoavian lineages experienced an ecological release in response to ecological opportunity, as evidenced by density compensation. We estimated effective population sizes (Ne) of basal neoavian lineages by combining coalescent branch lengths (CBLs) and the numbers of generations between successive divergences. We used a modified version of Accurate Species TRee Algorithm (ASTRAL) to estimate CBLs directly from insertion–deletion (indel) data, as well as from gene trees using DNA sequence and/or indel data. We found that some divergences near the K-Pg boundary involved unexpectedly high gene tree discordance relative to the estimated number of generations between speciation events. The simplest explanation for this result is an increase in Ne, despite the caveats discussed herein. It appears that at least some early neoavian lineages, similar to the ancestor of the clade comprising doves, mesites, and sandgrouse, experienced ecological release near the time of the K-Pg mass extinction.

Download Full-text

Continent-wide effects of urbanization on bird and mammal genetic diversity

10.1101/733170 ◽

2019 ◽

Author(s):

C. Schmidt ◽

M. Domaratzki ◽

R.P. Kinnunen ◽

J. Bowman ◽

C.J. Garroway

Keyword(s):

Genetic Diversity ◽

Environmental Changes ◽

Bird Species ◽

Population Level ◽

Natural Environments ◽

Population Declines ◽

Effective Population ◽

Human Footprint ◽

Rate Of Increase ◽

Population Sizes

AbstractUrbanization and associated environmental changes are causing global declines in vertebrate populations. In general, population declines of the magnitudes now detected should lead to reduced effective population sizes for animals living in proximity to humans and disturbed lands. This is cause for concern because effective population sizes set the rate of genetic diversity loss due to genetic drift, the rate of increase in inbreeding, and the efficiency with which selection can act on beneficial alleles. We predicted that the effects of urbanization should decrease effective population size and genetic diversity, and increase population-level genetic differentiation. To test for such patterns, we repurposed and reanalyzed publicly archived genetic data sets for North American birds and mammals. After filtering, we had usable raw genotype data from 85 studies and 41,023 individuals, sampled from 1,008 locations spanning 41 mammal and 25 bird species. We used census-based urban-rural designations, human population density, and the Human Footprint Index as measures of urbanization and habitat disturbance. As predicted, mammals sampled in more disturbed environments had lower effective population sizes and genetic diversity, and were more genetically differentiated from those in more natural environments. There were no consistent relationships detectable for birds. This suggests that, in general, mammal populations living near humans may have less capacity to respond adaptively to further environmental changes, and be more likely to suffer from effects of inbreeding.

Download Full-text

The Multispecies Coalescent in Space and Time

10.1101/2020.08.02.233395 ◽

2020 ◽

Author(s):

Patrick F. McKenzie ◽

Deren A. R. Eaton

Keyword(s):

Gene Tree ◽

Species Tree ◽

Gene Trees ◽

Effective Population ◽

Sliding Windows ◽

Multispecies Coalescent ◽

A Genome ◽

Tree Inference ◽

Rate Of Decay ◽

Coalescent Time

AbstractA key distinction between species tree inference under the multi-species coalescent model (MSC), and the inference of gene trees in sliding windows along a genome, is in the effect of genetic linkage. Whereas the MSC explicitly assumes genealogies to be unlinked, i.e., statistically independent, genealogies located close together on genomes are spatially auto-correlated. Here we use tree sequence simulations with recombination to explore the effects of species tree parameters on spatial patterns of linkage among genealogies. We decompose coalescent time units to demonstrate differential effects of generation time and effective population size on spatial coalescent patterns, and we define a new metric, “phylogenetic linkage,” for measuring the rate of decay of phylogenetic similarity by comparison to distances among unlinked genealogies. Finally, we provide a simple example where accounting for phylogenetic linkage in sliding window analyses improves local gene tree inference.

Download Full-text

Estimation of Cross-Species Introgression Rates using Genomic Data Despite Model Unidentifiability

10.1101/2021.08.14.456331 ◽

2021 ◽

Author(s):

Ziheng Yang ◽

Thomas Flouris

Keyword(s):

Genomic Sequence ◽

Sequence Data ◽

Gene Tree ◽

Synthetic Data ◽

Genomic Data ◽

Sister Species ◽

Label Switching ◽

Cross Model ◽

Multispecies Coalescent ◽

Full Likelihood

The multispecies coalescent with introgression (MSci) model accommodates both the coalescent process and cross-species introgression/ hybridization events, two major processes that create genealogical fluctuations across the genome and gene-tree-species-tree discordance. Full likelihood implementations of the MSci model take such fluctuations as a major source of information about the history of species divergence and gene flow, and provide a powerful tool for estimating the direction, timing and strength of cross-species introgression using multilocus sequence data. However, introgression models, in particular those that accommodate bidirectional introgression (BDI), are known to cause unidentifiability issues of the label-switching type, whereby different models or parameters make the same predictions about the genomic data and thus cannot be distinguished by the data. Nevertheless, there has been no systematic study of unidentifiability when full likelihood methods are applied. Here we characterize the unidentifiability of arbitrary BDI models and derive simple rules for its identification. In general, an MSci model with k BDI events has 2^k unidentifiable towers in the posterior, with each BDI event between sister species creating within-model unidentifiability and each BDI between non-sister species creating cross-model unidentifiability. We develop novel algorithms for processing Markov chain Monte Carlo (MCMC) samples to remove label switching and implement them in the BPP program. We analyze genomic sequence data from Heliconius butterflies as well as synthetic data to illustrate the utility of the BDI models and the new algorithms.

Download Full-text

ipcoal: An interactive Python package for simulating and analyzing genealogies and sequences on a species tree or network

10.1101/2020.01.15.908236 ◽

2020 ◽

Cited By ~ 1

Author(s):

Patrick F. McKenzie ◽

Deren A. R. Eaton

Keyword(s):

Sequence Data ◽

Gene Tree ◽

Species Tree ◽

Effective Population ◽

Demographic Models ◽

Link Type ◽

Tree Inference ◽

Population Sizes ◽

Visualization Tools ◽

Python Package

AbstractSummaryipcoal is a free and open source Python package for simulating and analyzing genealogies and sequences. It automates the task of describing complex demographic models (e.g., with divergence times, effective population sizes, migration events) to the msprime coalescent simulator by parsing a user-supplied species tree or network. Genealogies, sequences, and metadata are returned in tabular format allowing for easy downstream analyses. ipcoal includes phylogenetic inference tools to automate gene tree inference from simulated sequence data, and visualization tools for analyzing results and verifying model accuracy. The ipcoal package is a powerful tool for posterior predictive data analysis, for methods validation, and for teaching coalescent methods in an interactive and visual environment.Availability and implementationSource code is available from the GitHub repository (https://github.com/pmckenz1/ipcoal/) and is distributed for packaged installation with conda. Complete documentation and interactive notebooks prepared for teaching purposes are available at https://ipcoal.readthedocs.io/.

Download Full-text

Distribution of gene tree histories under the coalescent model with gene flow

10.1101/023937 ◽

2015 ◽

Author(s):

Yuan Tian ◽

Laura Kubatko

Keyword(s):

Gene Flow ◽

Maximum Likelihood ◽

Gene Tree ◽

Species Tree ◽

Tree Topology ◽

Effective Population ◽

Sequence Alignments ◽

Data Set ◽

Coalescent Model ◽

Population Sizes

We propose a coalescent model for three species that allows gene flow between both pairs of sister populations. The model is designed to analyze multilocus genomic sequence alignments, with one sequence sampled from each of the three species. The model is formulated using a Markov chain representation, which allows use of matrix exponentiation to compute analytical expressions for the probability density of gene tree genealogies. The gene tree history distribution as well as the gene tree topology distribution under this coalescent model with gene flow are then calculated via numerical integration. We analyze the model to compare the distributions of gene tree topologies and gene tree histories for species trees with differing effective population sizes and gene flow rates. Our results suggest conditions under which the species tree and associated parameters are not identifiable from the gene tree topology distribution when gene flow is present, but indicate that the gene tree history distribution may identify the species tree and associated parameters. Thus, the gene tree history distribution can be used to infer parameters such as the ancestral effective population sizes and the rates of gene flow in a maximum likelihood (ML) framework. We conduct computer simulations to evaluate the performance of our method in estimating these parameters, and we apply our method to an Afrotropical mosquito data set (Fontaine et al., 2015) to demonstrate the usefulness of our method for the analysis of empirical data. Key words: coalescent, gene flow, migration, hybridization, gene tree, topology, history, maximum likelihood, speciation.

Download Full-text

Genic evidence that gnetophytes are sister to all other seed plants

10.1101/629915 ◽

2019 ◽

Author(s):

Yinzhi Zhang ◽

Zhiming Liu

Keyword(s):

Gene Evolution ◽

Gene Tree ◽

Phylogenetic Inference ◽

Nuclear Genes ◽

Recent Common Ancestor ◽

Seed Plants ◽

Most Recent Common Ancestor ◽

Novel Approach ◽

Evolution Pattern ◽

Phylogenomic Analyses

AbstractGnetophytes, comprising three relict genera, Gnetum, Welwitchia and Ephedra, are a morphologically diverse and enigmatic assemblage among seed plants. Despite recent progress on phylogenomic analyses or the insights from the recently decoded Gnetum genome, the relationship between gnetophytes and other seed plant lineages is still one of the outstanding, unresolved questions in plant sciences. Here, we showed that phylogenetic studies from nuclear genes support the hypothesis that places gnetophytes as sister to all other extant seed plants and so this hypothesis should not be ruled out according to phylogenetic inference based on nuclear genes. However, this extraordinarily difficult phylogenetic problem might never be solved by phylogenetic inference based gene tree under various artificial selection. Hence, we adopted a novel approach, comparing gene divergence among different lineages, to solve the conflicts by showing that gnetophytes actually did not gained a set of genes like the most recent common ancestor (MRCA) of other seed plants. This distinct gene evolution pattern could not be explained by random gene lost as in other seed plants but should be interpreted by the early divergence of gnetophytes from rest of seed plants. With such a placement, the gymnosperms are paraphyletic and there should be three distinct groups of living seed plants: gnetophytes, non-gnetophytes gymnosperms and angiosperms.

Download Full-text

Widespread paleopolyploidy, gene tree conflict, and recalcitrant relationships among the carnivorous Caryophyllales

10.1101/115741 ◽

2017 ◽

Cited By ~ 1

Author(s):

Joseph F. Walker ◽

Ya Yang ◽

Michael J. Moore ◽

Jessica Mikenas ◽

Alfonso Timoneda ◽

...

Keyword(s):

Gene Tree ◽

West African ◽

Evolutionary Relationships ◽

Taxon Sampling ◽

Data Sets ◽

List Type ◽

Data Set ◽

Present Evidence ◽

Gene Tree Discordance ◽

Phylogenomic Analyses

ABSTRACTThe carnivorous members of the large, hyperdiverse Caryophyllales (e.g. Venus flytrap, sundews and Nepenthes pitcher plants) represent perhaps the oldest and most diverse lineage of carnivorous plants. However, despite numerous studies seeking to elucidate their evolutionary relationships, the early-diverging relationships remain unresolved.To explore the utility of phylogenomic data sets for resolving relationships among the carnivorous Caryophyllales, we sequenced ten transcriptomes, including all the carnivorous genera except those in the rare West African liana family (Dioncophyllaceae). We used a variety of methods to infer the species tree, examine gene tree conflict and infer paleopolyploidy events.Phylogenomic analyses support the monophyly of the carnivorous Caryophyllales, with an origin of 68-83 mya. In contrast to previous analyses recover the remaining non-core Caryophyllales as non-monophyletic, although there are multiple reasons this result may be spurious and node supporting this relationship contains a significant amount gene tree discordance. We present evidence that the clade contains at least seven independent paleopolyploidy events, previously debated nodes from the literature have high levels of gene tree conflict, and taxon sampling influences topology even in a phylogenomic data set.Our data demonstrate the importance of carefully considering gene tree conflict and taxon sampling in phylogenomic analyses. Moreover, they provide a remarkable example of the propensity for paleopolyploidy in angiosperms, with at least seven such events in a clade of less than 2500 species.

Download Full-text

dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms

10.7287/peerj.preprints.314v1 ◽

2014 ◽

Author(s):

Jonathan Puritz ◽

Christopher M. Hollenbeck ◽

John R. Gold

Keyword(s):

Population Genomics ◽

De Novo ◽

Variant Calling ◽

Population Level ◽

Model Organisms ◽

Effective Population ◽

Reduction Techniques ◽

Indel Polymorphisms ◽

Indel Calling ◽

Population Sizes

Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for organisms with large effective population sizes and high levels of genetic polymorphism but for which no genomic resources exist. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is most likely due to the fact that dDocent quality trims instead of filtering and incorporates both forward and reverse reads in assembly, mapping, and SNP calling, thus enabling use of reads with Indel polymorphisms. The pipeline and a comprehensive user guide can be found at (http://dDocent.wordpress.com).

Download Full-text