scholarly journals Multispecies coalescent analysis unravels the non-monophyly and controversial relationships of Hexapoda

2017 ◽  
Author(s):  
Lucas A. Freitas ◽  
Beatriz Mello ◽  
Carlos G. Schrago

AbstractWith the increase in the availability of genomic data, sequences from different loci are usually concatenated in a supermatrix for phylogenetic inference. However, as an alternative to the supermatrix approach, several implementations of the multispecies coalescent (MSC) have been increasingly used in phylogenomic analyses due to their advantages in accommodating gene tree topological heterogeneity by taking account population-level processes. Moreover, the development of faster algorithms under the MSC is enabling the analysis of thousands of loci/taxa. Here, we explored the MSC approach for a phylogenomic dataset of Insecta. Even with the challenges posed by insects, due to large effective population sizes coupled with short deep internal branches, our MSC analysis could recover several orders and evolutionary relationships in agreement with current insect systematics. However, some phylogenetic relationships were not recovered by MSC methods. Most noticeable, a remiped crustacean was positioned within the Insecta. Additionally, the interordinal relationships within Polyneoptera and Neuropteroidea contradicted recent works, by suggesting the non-monophyly of Neuroptera. We notice, however, that these phylogenetic arrangements were also poorly supported by previous analyses and that they were sensitive to gene sampling.

2014 ◽  
Author(s):  
Jonathan Puritz ◽  
Christopher M. Hollenbeck ◽  
John R. Gold

Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for organisms with large effective population sizes and high levels of genetic polymorphism but for which no genomic resources exist. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is most likely due to the fact that dDocent quality trims instead of filtering and incorporates both forward and reverse reads in assembly, mapping, and SNP calling, thus enabling use of reads with Indel polymorphisms. The pipeline and a comprehensive user guide can be found at (http://dDocent.wordpress.com).


Diversity ◽  
2020 ◽  
Vol 12 (4) ◽  
pp. 164 ◽  
Author(s):  
Peter Houde ◽  
Edward L. Braun ◽  
Lawrence Zhou

Assessing the applicability of theory to major adaptive radiations in deep time represents an extremely difficult problem in evolutionary biology. Neoaves, which includes 95% of living birds, is believed to have undergone a period of rapid diversification roughly coincident with the Cretaceous–Paleogene (K-Pg) boundary. We investigate whether basal neoavian lineages experienced an ecological release in response to ecological opportunity, as evidenced by density compensation. We estimated effective population sizes (Ne) of basal neoavian lineages by combining coalescent branch lengths (CBLs) and the numbers of generations between successive divergences. We used a modified version of Accurate Species TRee Algorithm (ASTRAL) to estimate CBLs directly from insertion–deletion (indel) data, as well as from gene trees using DNA sequence and/or indel data. We found that some divergences near the K-Pg boundary involved unexpectedly high gene tree discordance relative to the estimated number of generations between speciation events. The simplest explanation for this result is an increase in Ne, despite the caveats discussed herein. It appears that at least some early neoavian lineages, similar to the ancestor of the clade comprising doves, mesites, and sandgrouse, experienced ecological release near the time of the K-Pg mass extinction.


2019 ◽  
Author(s):  
C. Schmidt ◽  
M. Domaratzki ◽  
R.P. Kinnunen ◽  
J. Bowman ◽  
C.J. Garroway

AbstractUrbanization and associated environmental changes are causing global declines in vertebrate populations. In general, population declines of the magnitudes now detected should lead to reduced effective population sizes for animals living in proximity to humans and disturbed lands. This is cause for concern because effective population sizes set the rate of genetic diversity loss due to genetic drift, the rate of increase in inbreeding, and the efficiency with which selection can act on beneficial alleles. We predicted that the effects of urbanization should decrease effective population size and genetic diversity, and increase population-level genetic differentiation. To test for such patterns, we repurposed and reanalyzed publicly archived genetic data sets for North American birds and mammals. After filtering, we had usable raw genotype data from 85 studies and 41,023 individuals, sampled from 1,008 locations spanning 41 mammal and 25 bird species. We used census-based urban-rural designations, human population density, and the Human Footprint Index as measures of urbanization and habitat disturbance. As predicted, mammals sampled in more disturbed environments had lower effective population sizes and genetic diversity, and were more genetically differentiated from those in more natural environments. There were no consistent relationships detectable for birds. This suggests that, in general, mammal populations living near humans may have less capacity to respond adaptively to further environmental changes, and be more likely to suffer from effects of inbreeding.


2020 ◽  
Author(s):  
Patrick F. McKenzie ◽  
Deren A. R. Eaton

AbstractA key distinction between species tree inference under the multi-species coalescent model (MSC), and the inference of gene trees in sliding windows along a genome, is in the effect of genetic linkage. Whereas the MSC explicitly assumes genealogies to be unlinked, i.e., statistically independent, genealogies located close together on genomes are spatially auto-correlated. Here we use tree sequence simulations with recombination to explore the effects of species tree parameters on spatial patterns of linkage among genealogies. We decompose coalescent time units to demonstrate differential effects of generation time and effective population size on spatial coalescent patterns, and we define a new metric, “phylogenetic linkage,” for measuring the rate of decay of phylogenetic similarity by comparison to distances among unlinked genealogies. Finally, we provide a simple example where accounting for phylogenetic linkage in sliding window analyses improves local gene tree inference.


2021 ◽  
Author(s):  
Ziheng Yang ◽  
Thomas Flouris

The multispecies coalescent with introgression (MSci) model accommodates both the coalescent process and cross-species introgression/ hybridization events, two major processes that create genealogical fluctuations across the genome and gene-tree-species-tree discordance. Full likelihood implementations of the MSci model take such fluctuations as a major source of information about the history of species divergence and gene flow, and provide a powerful tool for estimating the direction, timing and strength of cross-species introgression using multilocus sequence data. However, introgression models, in particular those that accommodate bidirectional introgression (BDI), are known to cause unidentifiability issues of the label-switching type, whereby different models or parameters make the same predictions about the genomic data and thus cannot be distinguished by the data. Nevertheless, there has been no systematic study of unidentifiability when full likelihood methods are applied. Here we characterize the unidentifiability of arbitrary BDI models and derive simple rules for its identification. In general, an MSci model with k BDI events has 2^k unidentifiable towers in the posterior, with each BDI event between sister species creating within-model unidentifiability and each BDI between non-sister species creating cross-model unidentifiability. We develop novel algorithms for processing Markov chain Monte Carlo (MCMC) samples to remove label switching and implement them in the BPP program. We analyze genomic sequence data from Heliconius butterflies as well as synthetic data to illustrate the utility of the BDI models and the new algorithms.


Author(s):  
Patrick F. McKenzie ◽  
Deren A. R. Eaton

AbstractSummaryipcoal is a free and open source Python package for simulating and analyzing genealogies and sequences. It automates the task of describing complex demographic models (e.g., with divergence times, effective population sizes, migration events) to the msprime coalescent simulator by parsing a user-supplied species tree or network. Genealogies, sequences, and metadata are returned in tabular format allowing for easy downstream analyses. ipcoal includes phylogenetic inference tools to automate gene tree inference from simulated sequence data, and visualization tools for analyzing results and verifying model accuracy. The ipcoal package is a powerful tool for posterior predictive data analysis, for methods validation, and for teaching coalescent methods in an interactive and visual environment.Availability and implementationSource code is available from the GitHub repository (https://github.com/pmckenz1/ipcoal/) and is distributed for packaged installation with conda. Complete documentation and interactive notebooks prepared for teaching purposes are available at https://ipcoal.readthedocs.io/.


2015 ◽  
Author(s):  
Yuan Tian ◽  
Laura Kubatko

We propose a coalescent model for three species that allows gene flow between both pairs of sister populations. The model is designed to analyze multilocus genomic sequence alignments, with one sequence sampled from each of the three species. The model is formulated using a Markov chain representation, which allows use of matrix exponentiation to compute analytical expressions for the probability density of gene tree genealogies. The gene tree history distribution as well as the gene tree topology distribution under this coalescent model with gene flow are then calculated via numerical integration. We analyze the model to compare the distributions of gene tree topologies and gene tree histories for species trees with differing effective population sizes and gene flow rates. Our results suggest conditions under which the species tree and associated parameters are not identifiable from the gene tree topology distribution when gene flow is present, but indicate that the gene tree history distribution may identify the species tree and associated parameters. Thus, the gene tree history distribution can be used to infer parameters such as the ancestral effective population sizes and the rates of gene flow in a maximum likelihood (ML) framework. We conduct computer simulations to evaluate the performance of our method in estimating these parameters, and we apply our method to an Afrotropical mosquito data set (Fontaine et al., 2015) to demonstrate the usefulness of our method for the analysis of empirical data. Key words: coalescent, gene flow, migration, hybridization, gene tree, topology, history, maximum likelihood, speciation.


2019 ◽  
Author(s):  
Yinzhi Zhang ◽  
Zhiming Liu

AbstractGnetophytes, comprising three relict genera, Gnetum, Welwitchia and Ephedra, are a morphologically diverse and enigmatic assemblage among seed plants. Despite recent progress on phylogenomic analyses or the insights from the recently decoded Gnetum genome, the relationship between gnetophytes and other seed plant lineages is still one of the outstanding, unresolved questions in plant sciences. Here, we showed that phylogenetic studies from nuclear genes support the hypothesis that places gnetophytes as sister to all other extant seed plants and so this hypothesis should not be ruled out according to phylogenetic inference based on nuclear genes. However, this extraordinarily difficult phylogenetic problem might never be solved by phylogenetic inference based gene tree under various artificial selection. Hence, we adopted a novel approach, comparing gene divergence among different lineages, to solve the conflicts by showing that gnetophytes actually did not gained a set of genes like the most recent common ancestor (MRCA) of other seed plants. This distinct gene evolution pattern could not be explained by random gene lost as in other seed plants but should be interpreted by the early divergence of gnetophytes from rest of seed plants. With such a placement, the gymnosperms are paraphyletic and there should be three distinct groups of living seed plants: gnetophytes, non-gnetophytes gymnosperms and angiosperms.


2017 ◽  
Author(s):  
Joseph F. Walker ◽  
Ya Yang ◽  
Michael J. Moore ◽  
Jessica Mikenas ◽  
Alfonso Timoneda ◽  
...  

ABSTRACTThe carnivorous members of the large, hyperdiverse Caryophyllales (e.g. Venus flytrap, sundews and Nepenthes pitcher plants) represent perhaps the oldest and most diverse lineage of carnivorous plants. However, despite numerous studies seeking to elucidate their evolutionary relationships, the early-diverging relationships remain unresolved.To explore the utility of phylogenomic data sets for resolving relationships among the carnivorous Caryophyllales, we sequenced ten transcriptomes, including all the carnivorous genera except those in the rare West African liana family (Dioncophyllaceae). We used a variety of methods to infer the species tree, examine gene tree conflict and infer paleopolyploidy events.Phylogenomic analyses support the monophyly of the carnivorous Caryophyllales, with an origin of 68-83 mya. In contrast to previous analyses recover the remaining non-core Caryophyllales as non-monophyletic, although there are multiple reasons this result may be spurious and node supporting this relationship contains a significant amount gene tree discordance. We present evidence that the clade contains at least seven independent paleopolyploidy events, previously debated nodes from the literature have high levels of gene tree conflict, and taxon sampling influences topology even in a phylogenomic data set.Our data demonstrate the importance of carefully considering gene tree conflict and taxon sampling in phylogenomic analyses. Moreover, they provide a remarkable example of the propensity for paleopolyploidy in angiosperms, with at least seven such events in a clade of less than 2500 species.


2014 ◽  
Author(s):  
Jonathan Puritz ◽  
Christopher M. Hollenbeck ◽  
John R. Gold

Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for organisms with large effective population sizes and high levels of genetic polymorphism but for which no genomic resources exist. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is most likely due to the fact that dDocent quality trims instead of filtering and incorporates both forward and reverse reads in assembly, mapping, and SNP calling, thus enabling use of reads with Indel polymorphisms. The pipeline and a comprehensive user guide can be found at (http://dDocent.wordpress.com).


Sign in / Sign up

Export Citation Format

Share Document