scholarly journals ASPEN: A methodology for reconstructing protein evolution with improved accuracy using ensemble models

2017 ◽  
Author(s):  
Roman Sloutsky ◽  
Kristen M. Naegle

AbstractEvolutionary reconstruction algorithms produce models of the evolutionary history of proteins: the order of duplications and speciations that led to extant homologous proteins observed across species. Although they are regularly used to gain insight into protein function, these models are estimates of an unknowable truth according to the underlying assumptions inherent in each algorithm, its objective function, and the input sequences supplied for reconstruction. In practice, the generated models are highly sensitive to the sequence inputs. In this work, we asked whether we could identify stronger phylogenetic signal by capitalizing on the variance introduced by perturbing the input to evolutionary reconstruction to explore a rich space of possible models that could explain protein evolution. We subsampled from available protein orthologs, “same” proteins across multiple extant species, and produced an ensemble of topologies representing the duplication history which produced related proteins (paralogs) for simulated protein families and in a real protein family – the LacI transcription factor family. We found that two very important phenomena arise from this approach. First, the reproducibility of an all-sequence, single-alignment reconstruction, measured by comparing topologies inferred from 90% subsamples, directly correlates with the accuracy of that single-alignment reconstruction, producing a measurable value for something that has been traditionally unknowable. Second, if we take a large ensemble of trees inferred from 50% subsamples and cast the ensemble into a form that represents the distribution of pairwise leaf distances observed across the ensemble, then trees that capture the most frequently observed relationships are also the most accurate. We propose a new methodology, ASPEN, a meta-algorithm that finds and ranks the trees that are most consistent with observations across the ensemble. Top-ranked ASPEN trees are significantly more accurate than the single-alignment tree produced from all available sequences. Importantly, our findings suggest that the true tree is currently inaccessible for most real protein families. Instead, applications that rely on evolutionary models should integrate across many trees that are equally likely to represent the true evolutionary history of a protein family.

eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Roman Sloutsky ◽  
Kristen M Naegle

Evolutionary reconstruction algorithms produce models of the evolutionary history of proteins or species. Such algorithms are highly sensitive to their inputs: the sequences used and their alignments. Here, we asked whether the variance introduced by selecting different input sequences could be used to better identify accurate evolutionary models. We subsampled from available ortholog sequences and measured the distribution of observed relationships between paralogs produced across hundreds of models inferred from the subsamples. We observed two important phenomena. First, the reproducibility of an all-sequence, single-alignment reconstruction, measured by comparing topologies inferred from 90% subsamples, directly correlates with the accuracy of that single-alignment reconstruction, producing a measurable value for something that has been traditionally unknowable. Second, topologies that are most consistent with the observations made in the ensemble are more accurate and we present a meta algorithm that exploits this property to improve model accuracy.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Weizhao Yang ◽  
Nathalie Feiner ◽  
Catarina Pinho ◽  
Geoffrey M. While ◽  
Antigoni Kaliontzopoulou ◽  
...  

AbstractThe Mediterranean basin is a hotspot of biodiversity, fuelled by climatic oscillation and geological change over the past 20 million years. Wall lizards of the genus Podarcis are among the most abundant, diverse, and conspicuous Mediterranean fauna. Here, we unravel the remarkably entangled evolutionary history of wall lizards by sequencing genomes of 34 major lineages covering 26 species. We demonstrate an early (>11 MYA) separation into two clades centred on the Iberian and Balkan Peninsulas, and two clades of Mediterranean island endemics. Diversification within these clades was pronounced between 6.5–4.0 MYA, a period spanning the Messinian Salinity Crisis, during which the Mediterranean Sea nearly dried up before rapidly refilling. However, genetic exchange between lineages has been a pervasive feature throughout the entire history of wall lizards. This has resulted in a highly reticulated pattern of evolution across the group, characterised by mosaic genomes with major contributions from two or more parental taxa. These hybrid lineages gave rise to several of the extant species that are endemic to Mediterranean islands. The mosaic genomes of island endemics may have promoted their extraordinary adaptability and striking diversity in body size, shape and colouration, which have puzzled biologists for centuries.


Science ◽  
2021 ◽  
Vol 373 (6556) ◽  
pp. 792-796 ◽  
Author(s):  
Paul K. Strother ◽  
Clinton Foster

Molecular time trees indicating that embryophytes originated around 500 million years ago (Ma) during the Cambrian are at odds with the record of fossil plants, which first appear in the mid-Silurian almost 80 million years later. This time gap has been attributed to a missing fossil plant record, but that attribution belies the case for fossil spores. Here, we describe a Tremadocian (Early Ordovician, about 480 Ma) assemblage with elements of both Cambrian and younger embryophyte spores that provides a new level of evolutionary continuity between embryophytes and their algal ancestors. This finding suggests that the molecular phylogenetic signal retains a latent evolutionary history of the acquisition of the embryophytic developmental genome, a history that perhaps began during Ediacaran-Cambrian time but was not completed until the mid-Silurian (about 430 Ma).


Author(s):  
Gordon Grigg ◽  
David Kirshner

Biology and Evolution of Crocodylians is a comprehensive review of current knowledge about the world's largest and most famous living reptiles. Gordon Grigg's authoritative and accessible text and David Kirshner's stunning interpretive artwork and colour photographs combine expertly in this contemporary celebration of crocodiles, alligators, caimans and gharials. This book showcases the skills and capabilities that allow crocodylians to live how and where they do. It covers the biology and ecology of the extant species, conservation issues, crocodylian–human interaction and the evolutionary history of the group, and includes a vast amount of new information; 25 per cent of 1100 cited publications have appeared since 2007. Richly illustrated with more than 500 colour photographs and black and white illustrations, this book will be a benchmark reference work for crocodylian biologists, herpetologists and vertebrate biologists for years to come. Winner of the 2015 Whitley Medal.


2020 ◽  
Vol 70 (1) ◽  
pp. 67-85 ◽  
Author(s):  
Michael J Landis ◽  
Deren A R Eaton ◽  
Wendy L Clement ◽  
Brian Park ◽  
Elizabeth L Spriggs ◽  
...  

Abstract Phylogeny, molecular sequences, fossils, biogeography, and biome occupancy are all lines of evidence that reflect the singular evolutionary history of a clade, but they are most often studied separately, by first inferring a fossil-dated molecular phylogeny, then mapping on ancestral ranges and biomes inferred from extant species. Here we jointly model the evolution of biogeographic ranges, biome affinities, and molecular sequences, while incorporating fossils to estimate a dated phylogeny for all of the 163 extant species of the woody plant clade Viburnum (Adoxaceae) that we currently recognize in our ongoing worldwide monographic treatment of the group. Our analyses indicate that while the major Viburnum lineages evolved in the Eocene, the majority of extant species originated since the Miocene. Viburnum radiated first in Asia, in warm, broad-leaved evergreen (lucidophyllous) forests. Within Asia, we infer several early shifts into more tropical forests, and multiple shifts into forests that experience prolonged freezing. From Asia, we infer two early movements into the New World. These two lineages probably first occupied warm temperate forests and adapted later to spreading cold climates. One of these lineages (Porphyrotinus) occupied cloud forests and moved south through the mountains of the Neotropics. Several other movements into North America took place more recently, facilitated by prior adaptations to freezing in the Old World. We also infer four disjunctions between Asia and Europe: the Tinus lineage is the oldest and probably occupied warm forests when it spread, whereas the other three were more recent and in cold-adapted lineages. These results variously contradict published accounts, especially the view that Viburnum radiated initially in cold forests and, accordingly, maintained vessel elements with scalariform perforations. We explored how the location and biome assignments of fossils affected our inference of ancestral areas and biome states. Our results are sensitive to, but not entirely dependent upon, the inclusion of fossil biome data. It will be critical to take advantage of all available lines of evidence to decipher events in the distant past. The joint estimation approach developed here provides cautious hope even when fossil evidence is limited. [Biogeography; biome; combined evidence; fossil pollen; phylogeny; Viburnum.]


Phytotaxa ◽  
2016 ◽  
Vol 272 (4) ◽  
pp. 235
Author(s):  
JOSEPH MOHAN ◽  
JEFFERY R. STONE ◽  
CHRISTOPHER J CAMPISANO

Paleolake Hadar was an expansive lake in the lower Awash Valley of Ethiopia’s Afar Depression that existed periodically through the Late Pliocene. The sedimentary deposits from this ancient lake (Hadar Formation) have broad importance because a significant number of hominin fossils have been recovered from the formation. Samples of the Hadar Formation lacustrine sequence were collected from sediment cores extracted as part of the Hominin Sites and Paleolakes Drilling Project (HSPDP). A paleoecological study of the HSPDP Northern Awash (Hadar Formation) material has unearthed three novel species of Bacillariophyta (diatoms) from diatomites that appear periodically in the cores. The Hadar Formation assemblage represents a newly revealed excerpt from the evolutionary history of freshwater diatoms in East Africa during the Piacenᴢian age (2.59–3.60 Ma). The HSPDP Northern Awash diatom species are compared to previously reported diatoms from Pliocene outcrops, modern and fossil core material from Lake Malawi, and extant species. Here we describe two new species of Aulacoseira and one of Lindavia. Taxonomic treatment of two diatom varieties reported by previous researchers as Melosira are transferred into Aulacoseira herein.


2015 ◽  
Vol 29 (2) ◽  
pp. 191 ◽  
Author(s):  
James K. Liebherr ◽  
Nick Porch

A late Holocene but prehistoric carabid beetle fauna from the lowland Makauwahi Cave, Kauai, is characterised. Seven extinct species – Blackburnia burneyi, B. cryptipes, B. godzilla, B. menehune, B. mothra, B. ovata and B. rugosa, spp. nov. (tribe Platynini) – represent the first Hawaiian insect species to be newly described from subfossil specimens. Four extant Blackburnia spp. – B. aterrima (Sharp), B. bryophila Liebherr, B. pavida (Sharp), and B. posticata (Sharp) – and three extant species of tribe Bembidiini – Bembidion ignicola Blackburn, B. pacificum Sharp and Tachys oahuensis Blackburn – are also represented. All subfossil fragments are disarticulated, with physical dimensions and cladistic analysis used to associate the major somites – head, prothorax and elytra – for description of the new species. The seven new Makauwahi Cave species support recognition of a lowland area of endemism adjoining Haupu, a low-stature 700 m elevation ridgeline in southern Kauai. Four of the extinct Blackburnia are adelphotaxa to extant species currently found at higher elevations in Kauai. Addition of these lowland specialists to the phylogenetic hypothesis undercuts applicability of the taxon cycle for interpreting evolutionary history of these taxa. Two of the extinct species are Kauai representatives in clades that subsequently colonised younger Hawaiian Islands, enhancing support for the progressive biogeographic colonisation of the archipelago by this lineage. And three of the extinct Blackburnia species comprised larger beetles than those of any extant Kauai Blackburnia, consistent with the evolution of island gigantism in the lowland habitats of Kauai.


2018 ◽  
Author(s):  
Gang Li ◽  
Henrique V. Figueiro ◽  
Eduardo Eizirik ◽  
William J. Murphy

Current phylogenomic approaches implicitly assume that the predominant phylogenetic signal within a genome reflects the true evolutionary history of organisms, without assessing the confounding effects of gene flow that result in a mosaic of phylogenetic signals that interact with recombinational variation. Here we tested the validity of this assumption with a recombination-aware analysis of whole genome sequences from 27 species of the cat family. We found that the prevailing phylogenetic signal within the autosomes is not always representative of speciation history, due to ancient hybridization throughout felid evolution. Instead, phylogenetic signal was concentrated within large, conserved X-chromosome recombination deserts that exhibited recurrent patterns of strong genetic differentiation and selective sweeps across mammalian orders. By contrast, regions of high recombination were enriched for signatures of ancient gene flow, and these sequences inflated crown-lineage divergence times by ~40%. We conclude that standard phylogenomic approaches to infer the Tree of Life may be highly misleading without considering the genomic partitioning of phylogenetic signal relative to recombination rate, and its interplay with historical hybridization.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Dario Karmeinski ◽  
Karen Meusemann ◽  
Jessica A. Goodheart ◽  
Michael Schroedl ◽  
Alexander Martynov ◽  
...  

Abstract Background The soft-bodied cladobranch sea slugs represent roughly half of the biodiversity of marine nudibranch molluscs on the planet. Despite their global distribution from shallow waters to the deep sea, from tropical into polar seas, and their important role in marine ecosystems and for humans (as targets for drug discovery), the evolutionary history of cladobranch sea slugs is not yet fully understood. Results To enlarge the current knowledge on the phylogenetic relationships, we generated new transcriptome data for 19 species of cladobranch sea slugs and two additional outgroup taxa (Berthella plumula and Polycera quadrilineata). We complemented our taxon sampling with previously published transcriptome data, resulting in a final data set covering 56 species from all but one accepted cladobranch superfamilies. We assembled all transcriptomes using six different assemblers, selecting those assemblies that provided the largest amount of potentially phylogenetically informative sites. Quality-driven compilation of data sets resulted in four different supermatrices: two with full coverage of genes per species (446 and 335 single-copy protein-coding genes, respectively) and two with a less stringent coverage (667 genes with 98.9% partition coverage and 1767 genes with 86% partition coverage, respectively). We used these supermatrices to infer statistically robust maximum-likelihood trees. All analyses, irrespective of the data set, indicate maximal statistical support for all major splits and phylogenetic relationships at the family level. Besides the questionable position of Noumeaella rubrofasciata, rendering the Facelinidae as polyphyletic, the only notable discordance between the inferred trees is the position of Embletonia pulchra. Extensive testing using Four-cluster Likelihood Mapping, Approximately Unbiased tests, and Quartet Scores revealed that its position is not due to any informative phylogenetic signal, but caused by confounding signal. Conclusions Our data matrices and the inferred trees can serve as a solid foundation for future work on the taxonomy and evolutionary history of Cladobranchia. The placement of E. pulchra, however, proves challenging, even with large data sets and various optimization strategies. Moreover, quartet mapping results show that confounding signal present in the data is sufficient to explain the inferred position of E. pulchra, again leaving its phylogenetic position as an enigma.


Sign in / Sign up

Export Citation Format

Share Document