scholarly journals Why do phylogenomic analyses of early animal evolution continue to disagree? Sites in different structural environments yield different answers

2018 ◽  
Author(s):  
Akanksha Pandey ◽  
Edward L. Braun

AbstractPhylogenomics has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life. This could reflect the poor-fit of the models used to analyze heterogeneous datasets; that heterogeneity is likely to have many explanations. However, it seems reasonable to hypothesize that the different patterns of selection on proteins based on their structures might represent a source of heterogeneity. To test that hypothesis, we developed an efficient pipeline to divide phylogenomic datasets that comprise proteins into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had different signals for the deepest branches in the metazoan tree of life. Sites located in different structural environments did support distinct tree topologies. The most striking difference in phylogenetic signal reflected relative solvent accessibility; analyses of sites on the surface of proteins yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge-ctenophore clade. These differences in phylogenetic signal were not ameliorated when we repeated our analyses using the site-heterogeneous CAT model, a mixture model that is often used for analyses of protein datasets. In fact, analyses using the CAT model actually resulted in rearrangements that are unlikely to represent evolutionary history. These results provide striking evidence that it will be necessary to achieve a better understanding the constraints due to protein structure to improve phylogenetic estimation.

Author(s):  
Akanksha Pandey ◽  
Edward L. Braun

Phylogenomics, the use of large datasets to examine phylogeny, has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life; this could reflect, at least in part, the poor-fit of the models used to analyze heterogeneous datasets. Some of the heterogeneity may reflect the different patterns of selection on proteins based on their structures. To test that hypothesis, we developed a pipeline to divide phylogenomic protein datasets into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had distinct signals for the topology of the deepest branches in the metazoan tree. The most striking difference in phylogenetic signal reflected relative solvent accessibility; analyses of exposed sites (on the surface of proteins) yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge-ctenophore clade. These differences in phylogenetic signal were not ameliorated when we repeated our analyses using the CAT model, a mixture model that is often used for analyses of protein datasets. In fact, the heterogeneous CAT model resulted in several rearrangements that are unlikely to represent evolutionary history. However, analyses conducted after recoding amino acids to limit the impact of deviations from compositional stationarity increased the congruence in the estimates of phylogeny for exposed and buried sites; after recoding amino acids both trees supported placement of ctenophores sister to all other animals. These results provide striking evidence that it is necessary to achieve a better understanding of the constraints due to protein structure to improve phylogenetic estimation.


Biology ◽  
2020 ◽  
Vol 9 (4) ◽  
pp. 64 ◽  
Author(s):  
Akanksha Pandey ◽  
Edward L. Braun

Phylogenomics, the use of large datasets to examine phylogeny, has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life; this could reflect, at least in part, the poor-fit of the models used to analyze heterogeneous datasets. Some of the heterogeneity may reflect the different patterns of selection on proteins based on their structures. To test that hypothesis, we developed a pipeline to divide phylogenomic protein datasets into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had distinct signals for the topology of the deepest branches in the metazoan tree. We focused on a dataset that appeared to have a mixture of signals and we found that the most striking difference in phylogenetic signal reflected relative solvent accessibility. Analyses of exposed sites (residues located on the surface of proteins) yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge+ctenophore clade. These differences in phylogenetic signal were not ameliorated when we conducted analyses using a set of maximum-likelihood profile mixture models. These models are very similar to the Bayesian CAT model, which has been used in many analyses of deep metazoan phylogeny. In contrast, analyses conducted after recoding amino acids to limit the impact of deviations from compositional stationarity increased the congruence in the estimates of phylogeny for exposed and buried sites; after recoding amino acid trees estimated using the exposed and buried site both supported placement of ctenophores sister to all other animals. Although the central conclusion of our analyses is that sites in different structural environments yield distinct trees when analyzed using models of protein evolution, our amino acid recoding analyses also have implications for metazoan evolution. Specifically, our results add to the evidence that ctenophores are the sister group of all other animals and they further suggest that the placozoa+cnidaria clade found in some other studies deserves more attention. Taken as a whole, these results provide striking evidence that it is necessary to achieve a better understanding of the constraints due to protein structure to improve phylogenetic estimation.


2021 ◽  
Author(s):  
Jadranka Rota ◽  
Victoria Gwendoline Twort ◽  
Andrea Chiocchio ◽  
Carlos Pena ◽  
Christopher W. Wheat ◽  
...  

The field of molecular phylogenetics is being revolutionised with next-generation sequencing technologies making it possible to sequence large numbers of genomes for non-model organisms ushering us into the era of phylogenomics. The current challenge is no longer how to get enough data, but rather how to analyse the data and how to assess the support for the inferred phylogeny. We focus on one of the largest animal groups on the planet - butterflies and moths (order Lepidoptera). We clearly demonstrate that there are unresolved issues in the inferred phylogenetic relationships of the major lineages, despite several recent phylogenomic studies of the group. We assess the potential causes and consequences of the conflicting phylogenetic hypotheses. With a dataset consisting of 331 protein-coding genes and the alignment length over 290 000 base pairs, including 200 taxa representing 81% of lepidopteran superfamilies, we compare phylogenetic hypotheses inferred from amino acid and nucleotide alignments. The resulting two phylogenies are discordant, especially with respect to the placement of the superfamily Gelechioidea, which is likely due to compositional bias of both the nucleotide and amino acid sequences. With a series of analyses, we dissect our dataset and demonstrate that there is sufficient phylogenetic signal to resolve much of the lepidopteran tree of life. Overall, the results from the nucleotide alignment are more robust to the various perturbations of the data that we carried out. However, the lack of support for much of the backbone within Ditrysia makes the current butterfly and moth tree of life still unresolved. We conclude that taxon sampling remains an issue even in phylogenomic analyses, and recommend that poorly sampled highly diverse groups, such as Gelechioidea in Lepidoptera, should receive extra attention in the future.


2017 ◽  
Author(s):  
Ehsan Kayal ◽  
Bentlage Bastian ◽  
M Sabrina Pankey ◽  
Aki Ohdera ◽  
Monica Medina ◽  
...  

The phylogeny of Cnidaria has been a source of debate for decades, during which nearly all-possible relationships among the major lineages have been proposed. The ecological success of Cnidaria is predicated on several fascinating organismal innovations including symbiosis, colonial body plans and elaborate life histories, however, understanding the origins and subsequent diversification of these traits remains difficult due to persistent uncertainty surrounding the evolutionary relationships within Cnidaria. While recent phylogenomic studies have advanced our knowledge of the cnidarian tree of life, no analysis to date has included genome scale data for each major cnidarian lineage. Here we describe a well-supported hypothesis for cnidarian phylogeny based on phylogenomic analyses of new and existing genome scale data that includes representatives of all cnidarian classes. Our results are robust to alternative modes of phylogenetic estimation and phylogenomic dataset construction. We show that two popular phylogenomic matrix construction pipelines yield profoundly different datasets, both in the identities and the functional classes of the loci they include, but resolve the same topology. We then leverage our phylogenetic resolution of Cnidaria to understand the character histories of several critical organismal traits. Ancestral state reconstruction analyses based on our phylogeny establish several notable organismal transitions in the evolutionary history of Cnidaria and depict the ancestral cnidarian as a solitary, non-symbiotic polyp that lacked a medusa stage. In addition, Bayes factor tests of multiple origins strongly suggest that symbiosis has evolved multiple times independently across the cnidarian radiation. Cnidaria have experienced more than 600 million years of independent evolution and in the process generated an array of organismal innovations. Our results add significant clarification on the cnidarian tree of life and the histories of these innovations. Further, we confirm the existence of Acraspeda (staurozoans plus scyphozoans and cubozoans), thus reviving an evolutionary hypothesis put forward more than a century ago.


2020 ◽  
Author(s):  
Akanksha Pandey ◽  
Edward L. Braun

AbstractMotivationProtein sequence evolution is a complex process that varies among-sites within proteins and across the tree of life. Comparisons of evolutionary rate matrices for specific taxa (‘clade-specific models’) have the potential to reveal this variation and provide information about the underlying reasons for those changes. To study changes in patterns of protein sequence evolution we estimated and compared clade-specific models in a way that acknowledged variation within proteins due to structure.ResultsClade-specific model fit was able to correctly classify proteins from four specific groups (vertebrates, plants, oomycetes, and yeasts) more than 70% of the time. This was true whether we used mixture models that incorporate relative solvent accessibility or simple models that treat sites as homogeneous. Thus, protein evolution is non-homogeneous over the tree of life. However, a small number of dimensions could explain the differences among models (for mixture models ~50% of the variance reflected relative solvent accessibility and ~25% reflected clade). Relaxed purifying selection in taxa with lower long-term effective population sizes appears to explain much of the among clade variance. Relaxed selection on solvent-exposed sites was correlated with changes in amino acid side-chain volume; other differences among models were more complex. Beyond the information they reveal about protein evolution, our clade-specific models also represent tools for phylogenomic inference.AvailabilityModel files are available from https://github.com/ebraun68/[email protected] informationSupplementary data are appended to this preprint.


Author(s):  
Ehsan Kayal ◽  
Bentlage Bastian ◽  
M Sabrina Pankey ◽  
Aki Ohdera ◽  
Monica Medina ◽  
...  

The phylogeny of Cnidaria has been a source of debate for decades, during which nearly all-possible relationships among the major lineages have been proposed. The ecological success of Cnidaria is predicated on several fascinating organismal innovations including symbiosis, colonial body plans and elaborate life histories, however, understanding the origins and subsequent diversification of these traits remains difficult due to persistent uncertainty surrounding the evolutionary relationships within Cnidaria. While recent phylogenomic studies have advanced our knowledge of the cnidarian tree of life, no analysis to date has included genome scale data for each major cnidarian lineage. Here we describe a well-supported hypothesis for cnidarian phylogeny based on phylogenomic analyses of new and existing genome scale data that includes representatives of all cnidarian classes. Our results are robust to alternative modes of phylogenetic estimation and phylogenomic dataset construction. We show that two popular phylogenomic matrix construction pipelines yield profoundly different datasets, both in the identities and the functional classes of the loci they include, but resolve the same topology. We then leverage our phylogenetic resolution of Cnidaria to understand the character histories of several critical organismal traits. Ancestral state reconstruction analyses based on our phylogeny establish several notable organismal transitions in the evolutionary history of Cnidaria and depict the ancestral cnidarian as a solitary, non-symbiotic polyp that lacked a medusa stage. In addition, Bayes factor tests of multiple origins strongly suggest that symbiosis has evolved multiple times independently across the cnidarian radiation. Cnidaria have experienced more than 600 million years of independent evolution and in the process generated an array of organismal innovations. Our results add significant clarification on the cnidarian tree of life and the histories of these innovations. Further, we confirm the existence of Acraspeda (staurozoans plus scyphozoans and cubozoans), thus reviving an evolutionary hypothesis put forward more than a century ago.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0240953
Author(s):  
Christian Schulz ◽  
Eivind Almaas

Approaches for systematizing information of relatedness between organisms is important in biology. Phylogenetic analyses based on sets of highly conserved genes are currently the basis for the Tree of Life. Genome-scale metabolic reconstructions contain high-quality information regarding the metabolic capability of an organism and are typically restricted to metabolically active enzyme-encoding genes. While there are many tools available to generate draft reconstructions, expert-level knowledge is still required to generate and manually curate high-quality genome-scale metabolic models and to fill gaps in their reaction networks. Here, we use the tool AutoKEGGRec to construct 975 genome-scale metabolic draft reconstructions encoded in the KEGG database without further curation. The organisms are selected across all three domains, and their metabolic networks serve as basis for generating phylogenetic trees. We find that using all reactions encoded, these metabolism-based comparisons give rise to a phylogenetic tree with close similarity to the Tree of Life. While this tree is quite robust to reasonable levels of noise in the metabolic reaction content of an organism, we find a significant heterogeneity in how much noise an organism may tolerate before it is incorrectly placed in the tree. Furthermore, by using the protein sequences for particular metabolic functions and pathway sets, such as central carbon-, nitrogen-, and sulfur-metabolism, as basis for the organism comparisons, we generate highly specific phylogenetic trees. We believe the generation of phylogenetic trees based on metabolic reaction content, in particular when focused on specific functions and pathways, could aid the identification of functionally important metabolic enzymes and be of value for genome-scale metabolic modellers and enzyme-engineers.


Sign in / Sign up

Export Citation Format

Share Document