Why do phylogenomic analyses of early animal evolution continue to disagree? Sites in different structural environments yield different answers

Phylogenetic Analyses of Sites in Different Protein Structural Environments Result in Distinct Placements of the Metazoan Root

10.20944/preprints201910.0302.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Akanksha Pandey ◽

Edward L. Braun

Keyword(s):

Amino Acids ◽

Solvent Accessibility ◽

Phylogenetic Signal ◽

Phylogenetic Analyses ◽

Striking Difference ◽

Relative Solvent Accessibility ◽

Cat Model ◽

Protein Datasets ◽

Genome Scale ◽

The Impact

Phylogenomics, the use of large datasets to examine phylogeny, has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life; this could reflect, at least in part, the poor-fit of the models used to analyze heterogeneous datasets. Some of the heterogeneity may reflect the different patterns of selection on proteins based on their structures. To test that hypothesis, we developed a pipeline to divide phylogenomic protein datasets into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had distinct signals for the topology of the deepest branches in the metazoan tree. The most striking difference in phylogenetic signal reflected relative solvent accessibility; analyses of exposed sites (on the surface of proteins) yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge-ctenophore clade. These differences in phylogenetic signal were not ameliorated when we repeated our analyses using the CAT model, a mixture model that is often used for analyses of protein datasets. In fact, the heterogeneous CAT model resulted in several rearrangements that are unlikely to represent evolutionary history. However, analyses conducted after recoding amino acids to limit the impact of deviations from compositional stationarity increased the congruence in the estimates of phylogeny for exposed and buried sites; after recoding amino acids both trees supported placement of ctenophores sister to all other animals. These results provide striking evidence that it is necessary to achieve a better understanding of the constraints due to protein structure to improve phylogenetic estimation.

Download Full-text

Phylogenetic Analyses of Sites in Different Protein Structural Environments Result in Distinct Placements of the Metazoan Root

Biology ◽

10.3390/biology9040064 ◽

2020 ◽

Vol 9 (4) ◽

pp. 64 ◽

Cited By ~ 6

Author(s):

Akanksha Pandey ◽

Edward L. Braun

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Solvent Accessibility ◽

Phylogenetic Signal ◽

Phylogenetic Analyses ◽

Sister Group ◽

Striking Difference ◽

Relative Solvent Accessibility ◽

Protein Datasets ◽

The Impact

Phylogenomics, the use of large datasets to examine phylogeny, has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life; this could reflect, at least in part, the poor-fit of the models used to analyze heterogeneous datasets. Some of the heterogeneity may reflect the different patterns of selection on proteins based on their structures. To test that hypothesis, we developed a pipeline to divide phylogenomic protein datasets into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had distinct signals for the topology of the deepest branches in the metazoan tree. We focused on a dataset that appeared to have a mixture of signals and we found that the most striking difference in phylogenetic signal reflected relative solvent accessibility. Analyses of exposed sites (residues located on the surface of proteins) yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge+ctenophore clade. These differences in phylogenetic signal were not ameliorated when we conducted analyses using a set of maximum-likelihood profile mixture models. These models are very similar to the Bayesian CAT model, which has been used in many analyses of deep metazoan phylogeny. In contrast, analyses conducted after recoding amino acids to limit the impact of deviations from compositional stationarity increased the congruence in the estimates of phylogeny for exposed and buried sites; after recoding amino acid trees estimated using the exposed and buried site both supported placement of ctenophores sister to all other animals. Although the central conclusion of our analyses is that sites in different structural environments yield distinct trees when analyzed using models of protein evolution, our amino acid recoding analyses also have implications for metazoan evolution. Specifically, our results add to the evidence that ctenophores are the sister group of all other animals and they further suggest that the placozoa+cnidaria clade found in some other studies deserves more attention. Taken as a whole, these results provide striking evidence that it is necessary to achieve a better understanding of the constraints due to protein structure to improve phylogenetic estimation.

Download Full-text

The unresolved phylogenomic tree of butterflies and moths (Lepidoptera): assessing the potential causes and consequences

10.1101/2021.04.09.439156 ◽

2021 ◽

Author(s):

Jadranka Rota ◽

Victoria Gwendoline Twort ◽

Andrea Chiocchio ◽

Carlos Pena ◽

Christopher W. Wheat ◽

...

Keyword(s):

Amino Acid ◽

Molecular Phylogenetics ◽

Phylogenetic Signal ◽

Tree Of Life ◽

Amino Acid Sequences ◽

Model Organisms ◽

Compositional Bias ◽

Sequencing Technologies ◽

Phylogenetic Hypotheses ◽

Phylogenomic Analyses

The field of molecular phylogenetics is being revolutionised with next-generation sequencing technologies making it possible to sequence large numbers of genomes for non-model organisms ushering us into the era of phylogenomics. The current challenge is no longer how to get enough data, but rather how to analyse the data and how to assess the support for the inferred phylogeny. We focus on one of the largest animal groups on the planet - butterflies and moths (order Lepidoptera). We clearly demonstrate that there are unresolved issues in the inferred phylogenetic relationships of the major lineages, despite several recent phylogenomic studies of the group. We assess the potential causes and consequences of the conflicting phylogenetic hypotheses. With a dataset consisting of 331 protein-coding genes and the alignment length over 290 000 base pairs, including 200 taxa representing 81% of lepidopteran superfamilies, we compare phylogenetic hypotheses inferred from amino acid and nucleotide alignments. The resulting two phylogenies are discordant, especially with respect to the placement of the superfamily Gelechioidea, which is likely due to compositional bias of both the nucleotide and amino acid sequences. With a series of analyses, we dissect our dataset and demonstrate that there is sufficient phylogenetic signal to resolve much of the lepidopteran tree of life. Overall, the results from the nucleotide alignment are more robust to the various perturbations of the data that we carried out. However, the lack of support for much of the backbone within Ditrysia makes the current butterfly and moth tree of life still unresolved. We conclude that taxon sampling remains an issue even in phylogenomic analyses, and recommend that poorly sampled highly diverse groups, such as Gelechioidea in Lepidoptera, should receive extra attention in the future.

Download Full-text

Comprehensive phylogenomic analyses resolve cnidarian relationships and the origins of key organismal traits

10.7287/peerj.preprints.3172 ◽

2017 ◽

Author(s):

Ehsan Kayal ◽

Bentlage Bastian ◽

M Sabrina Pankey ◽

Aki Ohdera ◽

Monica Medina ◽

...

Keyword(s):

Life Histories ◽

Bayes Factor ◽

Tree Of Life ◽

Ancestral State ◽

Functional Classes ◽

Phylogenetic Resolution ◽

Phylogenomic Analyses ◽

Colonial Body ◽

Genome Scale ◽

Scale Data

The phylogeny of Cnidaria has been a source of debate for decades, during which nearly all-possible relationships among the major lineages have been proposed. The ecological success of Cnidaria is predicated on several fascinating organismal innovations including symbiosis, colonial body plans and elaborate life histories, however, understanding the origins and subsequent diversification of these traits remains difficult due to persistent uncertainty surrounding the evolutionary relationships within Cnidaria. While recent phylogenomic studies have advanced our knowledge of the cnidarian tree of life, no analysis to date has included genome scale data for each major cnidarian lineage. Here we describe a well-supported hypothesis for cnidarian phylogeny based on phylogenomic analyses of new and existing genome scale data that includes representatives of all cnidarian classes. Our results are robust to alternative modes of phylogenetic estimation and phylogenomic dataset construction. We show that two popular phylogenomic matrix construction pipelines yield profoundly different datasets, both in the identities and the functional classes of the loci they include, but resolve the same topology. We then leverage our phylogenetic resolution of Cnidaria to understand the character histories of several critical organismal traits. Ancestral state reconstruction analyses based on our phylogeny establish several notable organismal transitions in the evolutionary history of Cnidaria and depict the ancestral cnidarian as a solitary, non-symbiotic polyp that lacked a medusa stage. In addition, Bayes factor tests of multiple origins strongly suggest that symbiosis has evolved multiple times independently across the cnidarian radiation. Cnidaria have experienced more than 600 million years of independent evolution and in the process generated an array of organismal innovations. Our results add significant clarification on the cnidarian tree of life and the histories of these innovations. Further, we confirm the existence of Acraspeda (staurozoans plus scyphozoans and cubozoans), thus reviving an evolutionary hypothesis put forward more than a century ago.

Download Full-text

Protein evolution is structure dependent and non-homogeneous across the tree of life

10.1101/2020.01.28.923458 ◽

2020 ◽

Author(s):

Akanksha Pandey ◽

Edward L. Braun

Keyword(s):

Mixture Models ◽

Protein Evolution ◽

Protein Sequence ◽

Solvent Accessibility ◽

Tree Of Life ◽

Supplementary Information ◽

Amino Acid Side Chain ◽

Sequence Evolution ◽

Relative Solvent Accessibility ◽

Protein Sequence Evolution

AbstractMotivationProtein sequence evolution is a complex process that varies among-sites within proteins and across the tree of life. Comparisons of evolutionary rate matrices for specific taxa (‘clade-specific models’) have the potential to reveal this variation and provide information about the underlying reasons for those changes. To study changes in patterns of protein sequence evolution we estimated and compared clade-specific models in a way that acknowledged variation within proteins due to structure.ResultsClade-specific model fit was able to correctly classify proteins from four specific groups (vertebrates, plants, oomycetes, and yeasts) more than 70% of the time. This was true whether we used mixture models that incorporate relative solvent accessibility or simple models that treat sites as homogeneous. Thus, protein evolution is non-homogeneous over the tree of life. However, a small number of dimensions could explain the differences among models (for mixture models ~50% of the variance reflected relative solvent accessibility and ~25% reflected clade). Relaxed purifying selection in taxa with lower long-term effective population sizes appears to explain much of the among clade variance. Relaxed selection on solvent-exposed sites was correlated with changes in amino acid side-chain volume; other differences among models were more complex. Beyond the information they reveal about protein evolution, our clade-specific models also represent tools for phylogenomic inference.AvailabilityModel files are available from https://github.com/ebraun68/[email protected] informationSupplementary data are appended to this preprint.

Download Full-text

Comprehensive phylogenomic analyses resolve cnidarian relationships and the origins of key organismal traits

10.7287/peerj.preprints.3172v1 ◽

2017 ◽

Cited By ~ 3

Author(s):

Ehsan Kayal ◽

Bentlage Bastian ◽

M Sabrina Pankey ◽

Aki Ohdera ◽

Monica Medina ◽

...

Keyword(s):

Life Histories ◽

Bayes Factor ◽

Tree Of Life ◽

Ancestral State ◽

Functional Classes ◽

Phylogenetic Resolution ◽

Phylogenomic Analyses ◽

Colonial Body ◽

Genome Scale ◽

Scale Data

The phylogeny of Cnidaria has been a source of debate for decades, during which nearly all-possible relationships among the major lineages have been proposed. The ecological success of Cnidaria is predicated on several fascinating organismal innovations including symbiosis, colonial body plans and elaborate life histories, however, understanding the origins and subsequent diversification of these traits remains difficult due to persistent uncertainty surrounding the evolutionary relationships within Cnidaria. While recent phylogenomic studies have advanced our knowledge of the cnidarian tree of life, no analysis to date has included genome scale data for each major cnidarian lineage. Here we describe a well-supported hypothesis for cnidarian phylogeny based on phylogenomic analyses of new and existing genome scale data that includes representatives of all cnidarian classes. Our results are robust to alternative modes of phylogenetic estimation and phylogenomic dataset construction. We show that two popular phylogenomic matrix construction pipelines yield profoundly different datasets, both in the identities and the functional classes of the loci they include, but resolve the same topology. We then leverage our phylogenetic resolution of Cnidaria to understand the character histories of several critical organismal traits. Ancestral state reconstruction analyses based on our phylogeny establish several notable organismal transitions in the evolutionary history of Cnidaria and depict the ancestral cnidarian as a solitary, non-symbiotic polyp that lacked a medusa stage. In addition, Bayes factor tests of multiple origins strongly suggest that symbiosis has evolved multiple times independently across the cnidarian radiation. Cnidaria have experienced more than 600 million years of independent evolution and in the process generated an array of organismal innovations. Our results add significant clarification on the cnidarian tree of life and the histories of these innovations. Further, we confirm the existence of Acraspeda (staurozoans plus scyphozoans and cubozoans), thus reviving an evolutionary hypothesis put forward more than a century ago.

Download Full-text

Faculty Opinions recommendation of Phylogenetic signal in the eukaryotic tree of life.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1119793.575512 ◽

2008 ◽

Author(s):

Oliver Pybus

Keyword(s):

Phylogenetic Signal ◽

Tree Of Life

Download Full-text

Improved protein relative solvent accessibility prediction using deep multi-view feature learning framework

Analytical Biochemistry ◽

10.1016/j.ab.2021.114358 ◽

2021 ◽

Vol 631 ◽

pp. 114358

Author(s):

Xue-Qiang Fan ◽

Jun Hu ◽

Ning-Xin Jia ◽

Dong-Jun Yu ◽

Gui-Jun Zhang

Keyword(s):

Solvent Accessibility ◽

Feature Learning ◽

Relative Solvent Accessibility ◽

Learning Framework ◽

Solvent Accessibility Prediction

Download Full-text

SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity

Bioinformatics ◽

10.1093/bioinformatics/btu352 ◽

2014 ◽

Vol 30 (18) ◽

pp. 2592-2597 ◽

Cited By ~ 188

Author(s):

C. N. Magnan ◽

P. Baldi

Keyword(s):

Machine Learning ◽

Secondary Structure ◽

Solvent Accessibility ◽

Protein Secondary Structure ◽

Structural Similarity ◽

Relative Solvent Accessibility

Download Full-text

Genome-scale reconstructions to assess metabolic phylogeny and organism clustering

PLoS ONE ◽

10.1371/journal.pone.0240953 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0240953

Author(s):

Christian Schulz ◽

Eivind Almaas

Keyword(s):

Phylogenetic Trees ◽

Metabolic Networks ◽

Sulfur Metabolism ◽

Phylogenetic Analyses ◽

Tree Of Life ◽

Significant Heterogeneity ◽

Metabolic Reaction ◽

High Quality ◽

Conserved Genes ◽

Genome Scale

Approaches for systematizing information of relatedness between organisms is important in biology. Phylogenetic analyses based on sets of highly conserved genes are currently the basis for the Tree of Life. Genome-scale metabolic reconstructions contain high-quality information regarding the metabolic capability of an organism and are typically restricted to metabolically active enzyme-encoding genes. While there are many tools available to generate draft reconstructions, expert-level knowledge is still required to generate and manually curate high-quality genome-scale metabolic models and to fill gaps in their reaction networks. Here, we use the tool AutoKEGGRec to construct 975 genome-scale metabolic draft reconstructions encoded in the KEGG database without further curation. The organisms are selected across all three domains, and their metabolic networks serve as basis for generating phylogenetic trees. We find that using all reactions encoded, these metabolism-based comparisons give rise to a phylogenetic tree with close similarity to the Tree of Life. While this tree is quite robust to reasonable levels of noise in the metabolic reaction content of an organism, we find a significant heterogeneity in how much noise an organism may tolerate before it is incorrectly placed in the tree. Furthermore, by using the protein sequences for particular metabolic functions and pathway sets, such as central carbon-, nitrogen-, and sulfur-metabolism, as basis for the organism comparisons, we generate highly specific phylogenetic trees. We believe the generation of phylogenetic trees based on metabolic reaction content, in particular when focused on specific functions and pathways, could aid the identification of functionally important metabolic enzymes and be of value for genome-scale metabolic modellers and enzyme-engineers.

Download Full-text