Compound Dynamics and Combinatorial Patterns of Amino Acid Repeats Encode a System of Evolutionary and Developmental Markers

Ilaria Pelassa; Marica Cibelli; Veronica Villeri; Elena Lilliu; Serena Vaglietti; Federica Olocco; Mirella Ghirardi; Pier Giorgio Montarolo; Davide Corà; Ferdinando Fiumara

doi:10.1093/gbe/evz216

Compound Dynamics and Combinatorial Patterns of Amino Acid Repeats Encode a System of Evolutionary and Developmental Markers

Genome Biology and Evolution ◽

10.1093/gbe/evz216 ◽

2019 ◽

Vol 11 (11) ◽

pp. 3159-3178

Author(s):

Ilaria Pelassa ◽

Marica Cibelli ◽

Veronica Villeri ◽

Elena Lilliu ◽

Serena Vaglietti ◽

...

Keyword(s):

Amino Acid ◽

Evolutionary History ◽

Evolutionary Dynamics ◽

Phylogenetic Signal ◽

Regulatory System ◽

Evolutionary Transitions ◽

Quantitative Evidence ◽

Amino Acid Repeats ◽

Eukaryotic Proteomes ◽

History Of

Abstract Homopolymeric amino acid repeats (AARs) like polyalanine (polyA) and polyglutamine (polyQ) in some developmental proteins (DPs) regulate certain aspects of organismal morphology and behavior, suggesting an evolutionary role for AARs as developmental “tuning knobs.” It is still unclear, however, whether these are occasional protein-specific phenomena or hints at the existence of a whole AAR-based regulatory system in DPs. Using novel approaches to trace their functional and evolutionary history, we find quantitative evidence supporting a generalized, combinatorial role of AARs in developmental processes with evolutionary implications. We observe nonrandom AAR distributions and combinations in HOX and other DPs, as well as in their interactomes, defining elements of a proteome-wide combinatorial functional code whereby different AARs and their combinations appear preferentially in proteins involved in the development of specific organs/systems. Such functional associations can be either static or display detectable evolutionary dynamics. These findings suggest that progressive changes in AAR occurrence/combination, by altering embryonic development, may have contributed to taxonomic divergence, leaving detectable traces in the evolutionary history of proteomes. Consistent with this hypothesis, we find that the evolutionary trajectories of the 20 AARs in eukaryotic proteomes are highly interrelated and their individual or compound dynamics can sharply mark taxonomic boundaries, or display clock-like trends, carrying overall a strong phylogenetic signal. These findings provide quantitative evidence and an interpretive framework outlining a combinatorial system of AARs whose compound dynamics mark at the same time DP functions and evolutionary transitions.

Download Full-text

A fossil record of land plant origins from charophyte algae

Science ◽

10.1126/science.abj2927 ◽

2021 ◽

Vol 373 (6556) ◽

pp. 792-796 ◽

Cited By ~ 1

Author(s):

Paul K. Strother ◽

Clinton Foster

Keyword(s):

Fossil Record ◽

Evolutionary History ◽

Phylogenetic Signal ◽

Land Plant ◽

Fossil Plants ◽

Fossil Plant ◽

Molecular Phylogenetic ◽

History Of ◽

Time Gap ◽

Evolutionary Continuity

Molecular time trees indicating that embryophytes originated around 500 million years ago (Ma) during the Cambrian are at odds with the record of fossil plants, which first appear in the mid-Silurian almost 80 million years later. This time gap has been attributed to a missing fossil plant record, but that attribution belies the case for fossil spores. Here, we describe a Tremadocian (Early Ordovician, about 480 Ma) assemblage with elements of both Cambrian and younger embryophyte spores that provides a new level of evolutionary continuity between embryophytes and their algal ancestors. This finding suggests that the molecular phylogenetic signal retains a latent evolutionary history of the acquisition of the embryophytic developmental genome, a history that perhaps began during Ediacaran-Cambrian time but was not completed until the mid-Silurian (about 430 Ma).

Download Full-text

Recombination-aware phylogenomics unravels the complex divergence of hybridizing species.

10.1101/485904 ◽

2018 ◽

Cited By ~ 1

Author(s):

Gang Li ◽

Henrique V. Figueiro ◽

Eduardo Eizirik ◽

William J. Murphy

Keyword(s):

Gene Flow ◽

Evolutionary History ◽

Phylogenetic Signal ◽

Selective Sweeps ◽

A Genome ◽

Chromosome Recombination ◽

History Of ◽

Lineage Divergence ◽

Recurrent Patterns ◽

Ancient Gene

Current phylogenomic approaches implicitly assume that the predominant phylogenetic signal within a genome reflects the true evolutionary history of organisms, without assessing the confounding effects of gene flow that result in a mosaic of phylogenetic signals that interact with recombinational variation. Here we tested the validity of this assumption with a recombination-aware analysis of whole genome sequences from 27 species of the cat family. We found that the prevailing phylogenetic signal within the autosomes is not always representative of speciation history, due to ancient hybridization throughout felid evolution. Instead, phylogenetic signal was concentrated within large, conserved X-chromosome recombination deserts that exhibited recurrent patterns of strong genetic differentiation and selective sweeps across mammalian orders. By contrast, regions of high recombination were enriched for signatures of ancient gene flow, and these sequences inflated crown-lineage divergence times by ~40%. We conclude that standard phylogenomic approaches to infer the Tree of Life may be highly misleading without considering the genomic partitioning of phylogenetic signal relative to recombination rate, and its interplay with historical hybridization.

Download Full-text

Genetic diversity and evolutionary history of Korean isolates of severe fever with thrombocytopenia syndrome virus from 2013–2016

Archives of Virology ◽

10.1007/s00705-020-04733-0 ◽

2020 ◽

Vol 165 (11) ◽

pp. 2599-2603

Author(s):

Mi-ran Yun ◽

Jungsang Ryou ◽

Wooyoung Choi ◽

Joo-Yeon Lee ◽

Sun-Whan Park ◽

...

Keyword(s):

Infectious Disease ◽

Genetic Diversity ◽

Amino Acid ◽

Evolutionary History ◽

Public Database ◽

Korean Strain ◽

Future Studies ◽

History Of ◽

Phylogeny And Evolution ◽

Severe Fever

AbstractSevere fever with thrombocytopenia syndrome (SFTS) is caused by SFTS virus (SFTSV). Although SFTS originated in China, it is an emerging infectious disease with prevalence confirmed in Japan, Korea, and Vietnam. The full-length genomes of 51 Korean SFTSV isolates from 2013 to 2016 were sequenced, and the sequences were deposited into a public database (GenBank) and analyzed to elucidate the phylogeny and evolution of the virus. Although most of the Korean SFTSV isolates were closely related to previously reported Japanese isolates, some were closely related to previously reported Chinese isolates. We identified one Korean strain that appears to have resulted from multiple inter-lineage reassortments. Several nucleotide and amino acid variations specific to the Korean isolates were identified. Future studies should focus on how these variations affect virus pathogenicity and evolution.

Download Full-text

Advances in Quaternary Studies: The Contribution of the Mammalian Fossil Record

Quaternary ◽

10.3390/quat1030026 ◽

2018 ◽

Vol 1 (3) ◽

pp. 26

Author(s):

Maria Palombo

Keyword(s):

Fossil Record ◽

Evolutionary History ◽

Evolutionary Dynamics ◽

Dynamic Interactions ◽

Causal Factors ◽

History Of ◽

Mammalian Fossil

Explaining the multifaceted, dynamic interactions of the manifold factors that have modelled throughout the ages the evolutionary history of the biosphere is undoubtedly a fascinating and challenging task that has been intriguing palaeontologists, biologists and ecologists for decades, in a never-ending pursuit of the causal factors that controlled the evolutionary dynamics of the Earth’s ecosystems throughout deep and Quaternary time. [...]

Download Full-text

Transcriptomics provides a robust framework for the relationships of the major clades of cladobranch sea slugs (Mollusca, Gastropoda, Heterobranchia), but fails to resolve the position of the enigmatic genus Embletonia

BMC Ecology and Evolution ◽

10.1186/s12862-021-01944-0 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Dario Karmeinski ◽

Karen Meusemann ◽

Jessica A. Goodheart ◽

Michael Schroedl ◽

Alexander Martynov ◽

...

Keyword(s):

Phylogenetic Relationships ◽

Evolutionary History ◽

Phylogenetic Signal ◽

Set Covering ◽

Phylogenetic Position ◽

Data Sets ◽

Transcriptome Data ◽

Data Set ◽

Sea Slugs ◽

History Of

Abstract Background The soft-bodied cladobranch sea slugs represent roughly half of the biodiversity of marine nudibranch molluscs on the planet. Despite their global distribution from shallow waters to the deep sea, from tropical into polar seas, and their important role in marine ecosystems and for humans (as targets for drug discovery), the evolutionary history of cladobranch sea slugs is not yet fully understood. Results To enlarge the current knowledge on the phylogenetic relationships, we generated new transcriptome data for 19 species of cladobranch sea slugs and two additional outgroup taxa (Berthella plumula and Polycera quadrilineata). We complemented our taxon sampling with previously published transcriptome data, resulting in a final data set covering 56 species from all but one accepted cladobranch superfamilies. We assembled all transcriptomes using six different assemblers, selecting those assemblies that provided the largest amount of potentially phylogenetically informative sites. Quality-driven compilation of data sets resulted in four different supermatrices: two with full coverage of genes per species (446 and 335 single-copy protein-coding genes, respectively) and two with a less stringent coverage (667 genes with 98.9% partition coverage and 1767 genes with 86% partition coverage, respectively). We used these supermatrices to infer statistically robust maximum-likelihood trees. All analyses, irrespective of the data set, indicate maximal statistical support for all major splits and phylogenetic relationships at the family level. Besides the questionable position of Noumeaella rubrofasciata, rendering the Facelinidae as polyphyletic, the only notable discordance between the inferred trees is the position of Embletonia pulchra. Extensive testing using Four-cluster Likelihood Mapping, Approximately Unbiased tests, and Quartet Scores revealed that its position is not due to any informative phylogenetic signal, but caused by confounding signal. Conclusions Our data matrices and the inferred trees can serve as a solid foundation for future work on the taxonomy and evolutionary history of Cladobranchia. The placement of E. pulchra, however, proves challenging, even with large data sets and various optimization strategies. Moreover, quartet mapping results show that confounding signal present in the data is sufficient to explain the inferred position of E. pulchra, again leaving its phylogenetic position as an enigma.

Download Full-text

Carboxylesterases (EC 3.1.1). Amino Acid Composition of Liver Carboxylesterases

Canadian Journal of Biochemistry ◽

10.1139/o75-076 ◽

1975 ◽

Vol 53 (5) ◽

pp. 561-564 ◽

Cited By ~ 15

Author(s):

Keith Scott ◽

Burt Zerner

Keyword(s):

Amino Acid ◽

Amino Acid Composition ◽

Acid Composition ◽

Homologous Series ◽

Evolutionary History ◽

Amino Acid Compositions ◽

General Similarity ◽

History Of

The amino acid compositions of the carboxylesterases from chicken, horse, ox, sheep, and pig livers are reported and compared. As would be expected for this homologous series, the compositions show a general similarity. However, there are some significant differences, but the degree to which particular pairs of enzymes differ is consistent with the evolutionary history of the species from which they were isolated.

Download Full-text

RecPD: A Recombination-Aware Measure of Phylogenetic Diversity

10.1101/2021.10.01.462747 ◽

2021 ◽

Author(s):

Cedoljub Bundalovic-Torma ◽

Darrell Desveaux ◽

David S Guttman

Keyword(s):

Pseudomonas Syringae ◽

Evolutionary History ◽

Phylogenetic Diversity ◽

Evolutionary Dynamics ◽

Gene Families ◽

Ancestral State ◽

History Of ◽

Preliminary Study ◽

Potential Impact ◽

Study Type

A critical step in studying biological features (e.g., genetic variants, gene families, metabolic capabilities, or taxa) underlying traits or outcomes of interest is assessing their diversity and distribution. Accurate assessments of these patterns are essential for linking features to traits or outcomes and understanding their functional impact. Consequently, it is of crucial importance that the metrics employed for quantifying feature diversity can perform robustly under any evolutionary scenario. However, the standard metrics used for quantifying and comparing the distribution of features, such as prevalence, phylogenetic diversity, and related approaches, either do not take into consideration evolutionary history, or assume strictly vertical patterns of inheritance. Consequently, these approaches cannot accurately assess diversity for features that have undergone recombination or horizontal transfer. To address this issue, we have devised RecPD, a novel recombination-aware phylogenetic-diversity metric for measuring the distribution and diversity of features under all evolutionary scenarios. RecPD utilizes ancestral-state reconstruction to map the presence / absence of features onto ancestral nodes in a species tree, and then identifies potential recombination events in the evolutionary history of the feature. We also derive a number of related metrics from RecPD that can be used to assess and quantify evolutionary dynamics and correlation of feature evolutionary histories. We used simulation studies to show that RecPD reliably identifies evolutionary histories under diverse recombination and loss scenarios. We then apply RecPD in a real-world scenario in a preliminary study type III effector protein families secreted by the plant pathogenic bacterium Pseudomonas syringae and demonstrate that prevalence is an inadequate metric that obscures the potential impact of recombination. We believe RecPD will have broad utility for revealing and quantifying complex evolutionary processes for features at any biological level.

Download Full-text

The evolutionary history of a gammaretrovirus currently colonizing the mule deer genome is marked by extensive recombination

10.1101/2021.02.24.432774 ◽

2021 ◽

Author(s):

Lei Yang ◽

Raunaq Malhotra ◽

Rayan Chikhi ◽

Daniel Elleder ◽

Theodora Kaiser ◽

...

Keyword(s):

Evolutionary History ◽

Evolutionary Dynamics ◽

De Novo ◽

Mule Deer ◽

Endogenous Retrovirus ◽

Host Population ◽

Endogenous Retroviruses ◽

Genomic Diversity ◽

Evolutionary Trajectory ◽

History Of

AbstractBackgroundAll vertebrate genomes have been colonized by retroviruses along their evolutionary trajectory. Although it is clear that endogenous retroviruses (ERVs) can contribute important physiological functions to contemporary hosts, such benefits are attributed to long-term co-evolution of ERV and host. Newly colonized ERVs are thought unlikely to contribute to host genome evolution because germline infections are rare and because the host effectively silences them. The genomes of several outbred species including mule deer (Odocoileus hemionus) are currently being colonized by ERVs, which provides an opportunity to study ERV dynamics at a time when few are fixed.Here we investigate the history of cervid endogenous retrovirus (CrERV) acquisition and expansion in the mule deer genome to determine the potential impact of endogenizing retroviruses on host genomic diversity.MethodsA mule deer genome was de novo assembled from short and long insert mate pair reads. Scaffolds were further assembled using reference assisted chromosome assembly (RACA) to provide spatial orientation of CrERV insertion sites and to facilitate assembly of CrERV sequences. We applied phylogenetic and coalescent approaches to non-recombinant genomes to determine CrERV evolutionary history, augmenting ancestral divergence estimates with the prevalence of each CrERV locus in a population of mule deer. Recombination history was investigated on partial genome alignments.ResultsThe CrERV composition and diversity in the mule deer genome has recently measurably increased by horizontal acquisition of a new retroviruses lineage and because of recombination with existing CrERV. Resulting interlineage recombinants also endogenized and subsequently retrotransposed. CrERV loci are significantly closer to genes than expected if integration were random and gene proximity might explain the recent expansion by retrotransposition of one recombinant CrERV lineage.ConclusionsThere has been a burst of CrERV integrations during a recent retrovirus epizootic that increased genomic CrERV burden and has resulted in extensive insertional polymorphism in contemporary mule deer genomes. Recombination is a defining feature of CrERV evolutionary dynamics driven by this colonization, increasing CrERV burden and CrERV genetic diversity. These data support that retroviral colonization during an epizootic provides a burst of genomic diversity to the host population.

Download Full-text

Phylogenetic non-independence in rates of trait evolution

Biology Letters ◽

10.1098/rsbl.2018.0502 ◽

2018 ◽

Vol 14 (10) ◽

pp. 20180502 ◽

Cited By ~ 2

Author(s):

Manabu Sakamoto ◽

Chris Venditti

Keyword(s):

Statistical Power ◽

Evolutionary History ◽

Phylogenetic Signal ◽

Small Sample ◽

Evolutionary Rates ◽

Biological Traits ◽

Beak Shape ◽

Rates Of Evolution ◽

Shared Ancestry ◽

History Of

Statistical non-independence of species’ biological traits is recognized in most traits under selection. Yet, whether or not the evolutionary rates of such biological traits are statistically non-independent remains to be tested. Here, we test the hypothesis that phenotypic evolutionary rates are non-independent, i.e. contain phylogenetic signal, using empirical rates of evolution in three separate traits: body mass in mammals, beak shape in birds and bite force in amniotes. Specifically, we test if evolutionary rates are phylogenetically interdependent. We find evidence for phylogenetic signal in evolutionary rates in all three case studies. While phylogenetic signal diminishes deeper in time, this is reflective of statistical power owing to small sample and effect sizes. When effect size is large, e.g. owing to the presence of fossil tips, we detect high phylogenetic signals even in deeper time slices. Thus, we recommend that rates be treated as being non-independent throughout the evolutionary history of the group of organisms under study, and any summaries or analyses of rates through time—including associations of rates with traits—need to account for the undesired effects of shared ancestry.

Download Full-text

ASPEN: A methodology for reconstructing protein evolution with improved accuracy using ensemble models

10.1101/170787 ◽

2017 ◽

Author(s):

Roman Sloutsky ◽

Kristen M. Naegle

Keyword(s):

Protein Evolution ◽

Protein Function ◽

Evolutionary History ◽

Phylogenetic Signal ◽

Protein Family ◽

Reconstruction Algorithms ◽

Protein Families ◽

Homologous Proteins ◽

Extant Species ◽

History Of

AbstractEvolutionary reconstruction algorithms produce models of the evolutionary history of proteins: the order of duplications and speciations that led to extant homologous proteins observed across species. Although they are regularly used to gain insight into protein function, these models are estimates of an unknowable truth according to the underlying assumptions inherent in each algorithm, its objective function, and the input sequences supplied for reconstruction. In practice, the generated models are highly sensitive to the sequence inputs. In this work, we asked whether we could identify stronger phylogenetic signal by capitalizing on the variance introduced by perturbing the input to evolutionary reconstruction to explore a rich space of possible models that could explain protein evolution. We subsampled from available protein orthologs, “same” proteins across multiple extant species, and produced an ensemble of topologies representing the duplication history which produced related proteins (paralogs) for simulated protein families and in a real protein family – the LacI transcription factor family. We found that two very important phenomena arise from this approach. First, the reproducibility of an all-sequence, single-alignment reconstruction, measured by comparing topologies inferred from 90% subsamples, directly correlates with the accuracy of that single-alignment reconstruction, producing a measurable value for something that has been traditionally unknowable. Second, if we take a large ensemble of trees inferred from 50% subsamples and cast the ensemble into a form that represents the distribution of pairwise leaf distances observed across the ensemble, then trees that capture the most frequently observed relationships are also the most accurate. We propose a new methodology, ASPEN, a meta-algorithm that finds and ranks the trees that are most consistent with observations across the ensemble. Top-ranked ASPEN trees are significantly more accurate than the single-alignment tree produced from all available sequences. Importantly, our findings suggest that the true tree is currently inaccessible for most real protein families. Instead, applications that rely on evolutionary models should integrate across many trees that are equally likely to represent the true evolutionary history of a protein family.

Download Full-text