Developmental constraints on genome evolution in four bilaterian model species

Mapping Intimacies ◽

10.1101/161679 ◽

2017 ◽

Author(s):

Jialin Liu ◽

Marc Robinson-Rechavi

Keyword(s):

Genome Evolution ◽

Purifying Selection ◽

Regulatory Elements ◽

Sequence Evolution ◽

Late Development ◽

Developmental Constraints ◽

Protein Coding ◽

New Genes ◽

Hourglass Model ◽

Conservation Model

AbstractDevelopmental constraints on genome evolution have been suggested to follow either an early conservation model or an “hourglass” model. Both models agree that late development strongly diverges between species, but debate on which developmental period is the most conserved. Here, based on a modified “Transcriptome Age Index” approach, i.e. weighting trait measures by expression level, we analyzed the constraints acting on three evolutionary traits of protein coding genes (strength of purifying selection on protein sequences, phyletic age, and duplicability) in four species: nematode worm Caenorhabditis elegans, fly Drosophila melanogaster, zebrafish Danio rerio, and mouse Mus musculus. In general, we found that both models can be supported by different genomic properties. Sequence evolution follows an hourglass model, but the evolution of phyletic age and of duplicability follow an early conservation model. Further analyses indicate that stronger purifying selection on sequences in the middle development are driven by temporal pleiotropy of these genes. In addition, we report evidence that expression in late development is enriched with retrogenes, which usually lack efficient regulatory elements. This implies that expression in late development could facilitate transcription of new genes, and provide opportunities for acquisition of function. Finally, in C. elegans, we suggest that dosage imbalance could be one of the main factors that cause depleted expression of high duplicability genes in early development.

Download Full-text

Eusociality Shapes Convergent Patterns of Molecular Evolution across Mitochondrial Genomes of Snapping Shrimps

Molecular Biology and Evolution ◽

10.1093/molbev/msaa297 ◽

2020 ◽

Author(s):

Solomon T C Chak ◽

Juan Antonio Baeza ◽

Phillip Barden

Keyword(s):

Molecular Evolution ◽

Genome Evolution ◽

Purifying Selection ◽

Synonymous Substitution ◽

Mitochondrial Genomes ◽

Comparative Genomic ◽

Effective Population ◽

Protein Coding ◽

Marine Realm ◽

Snapping Shrimps

Abstract Eusociality is a highly conspicuous and ecologically impactful behavioral syndrome that has evolved independently across multiple animal lineages. So far, comparative genomic analyses of advanced sociality have been mostly limited to insects. Here, we study the only clade of animals known to exhibit eusociality in the marine realm—lineages of socially diverse snapping shrimps in the genus Synalpheus. To investigate the molecular impact of sociality, we assembled the mitochondrial genomes of eight Synalpheus species that represent three independent origins of eusociality and analyzed patterns of molecular evolution in protein-coding genes. Synonymous substitution rates are lower and potential signals of relaxed purifying selection are higher in eusocial relative to noneusocial taxa. Our results suggest that mitochondrial genome evolution was shaped by eusociality-linked traits—extended generation times and reduced effective population sizes that are hallmarks of advanced animal societies. This is the first direct evidence of eusociality impacting genome evolution in marine taxa. Our results also strongly support the idea that eusociality can shape genome evolution through profound changes in life history and demography.

Download Full-text

Integrative analysis reveals RNA G-Quadruplexes in UTRs are selectively constrained and enriched for functional associations

10.1101/666842 ◽

2019 ◽

Cited By ~ 2

Author(s):

David S.M. Lee ◽

Louis R. Ghanem ◽

Yoseph Barash

Keyword(s):

Regulatory Networks ◽

Rna Binding ◽

Purifying Selection ◽

Regulatory Elements ◽

Messenger Rnas ◽

Multiple Sources ◽

Protein Coding ◽

G Quadruplex ◽

Preferential Binding ◽

Missense Variation

ABSTRACTIdentifying regulatory elements in the noncoding genome is a fundamental challenge in biology. G-quadruplex (G4) sequences are abundant in untranslated regions (UTRs) of human messenger RNAs, but their functional importance remains unclear. By integrating multiple sources of genetic and genomic data, we show that putative G-quadruplex forming sequences (pG4) in 5’ and 3’ UTRs are selectively constrained, and enriched for cis-eQTLs and RNA-binding protein (RBP) interactions. Using over 15,000 whole-genome sequences, we uncover a degree of negative (purifying) selection in UTR pG4s comparable to that of missense variation in protein-coding sequences. In parallel, we identify new proteins with evidence for preferential binding at pG4s from ENCODE annotations, and delineate putative regulatory networks composed of shared binding targets. Finally, by mapping variants in the NIH GWAS Catalogue and ClinVar, we find enrichment for disease-associated variation in 3’UTR pG4s. At a GWAS pG4-variant associated with hypertension in HSPB7, we uncover robust allelic imbalance in GTEx RNA-seq across multiple tissues, suggesting that changes in gene expression associated with pG4 disruption underlie the observed phenotypic association. Taken together, our results establish UTR G-quadruplexes as important cis-regulatory features, and point to a putative link between disruption within UTR pG4 and susceptibility to human disease.

Download Full-text

Clustering and visualizing the distribution of overlapping reading frames in virus genomes

10.1101/2021.06.10.447953 ◽

2021 ◽

Author(s):

Laura Munoz-Baena ◽

Art Poon

Keyword(s):

Purifying Selection ◽

Protein Coding ◽

Virus Family ◽

Reading Frame ◽

New Genes ◽

Overlapping Reading Frames ◽

Genome Level ◽

Dsdna Viruses ◽

Virus Genomes ◽

Reading Frames

Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated reading frames in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. However, the longest overlaps involve no shift in reading frame (+0), increasing the selective burden of the same nucleotide positions within codons, instead of exposing additional sites to purifying selection. Next, we develop a new graph-based representation of the distribution of OvRFs among the reading frames of genomes in a given virus family. In the absence of an unambiguous partition of reading frames by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent reading frames are adjacent in one or more genomes, and (2) that the reading frames overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.

Download Full-text

Theory of prokaryotic genome evolution

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1614083113 ◽

2016 ◽

Vol 113 (41) ◽

pp. 11399-11407 ◽

Cited By ~ 62

Author(s):

Itamar Sela ◽

Yuri I. Wolf ◽

Eugene V. Koonin

Keyword(s):

Genome Size ◽

Prokaryotic Genome ◽

Genetic Material ◽

Purifying Selection ◽

Synonymous Substitution ◽

Protein Coding ◽

Protein Coding Genes ◽

New Genes ◽

Extra Energy ◽

Prokaryotic Genomes

Bacteria and archaea typically possess small genomes that are tightly packed with protein-coding genes. The compactness of prokaryotic genomes is commonly perceived as evidence of adaptive genome streamlining caused by strong purifying selection in large microbial populations. In such populations, even the small cost incurred by nonfunctional DNA because of extra energy and time expenditure is thought to be sufficient for this extra genetic material to be eliminated by selection. However, contrary to the predictions of this model, there exists a consistent, positive correlation between the strength of selection at the protein sequence level, measured as the ratio of nonsynonymous to synonymous substitution rates, and microbial genome size. Here, by fitting the genome size distributions in multiple groups of prokaryotes to predictions of mathematical models of population evolution, we show that only models in which acquisition of additional genes is, on average, slightly beneficial yield a good fit to genomic data. These results suggest that the number of genes in prokaryotic genomes reflects the equilibrium between the benefit of additional genes that diminishes as the genome grows and deletion bias (i.e., the rate of deletion of genetic material being slightly greater than the rate of acquisition). Thus, new genes acquired by microbial genomes, on average, appear to be adaptive. The tight spacing of protein-coding genes likely results from a combination of the deletion bias and purifying selection that efficiently eliminates nonfunctional, noncoding sequences.

Download Full-text

The Cowpea Kinome: Genomic and Transcriptomic Analysis Under Biotic and Abiotic Stresses

Frontiers in Plant Science ◽

10.3389/fpls.2021.667013 ◽

2021 ◽

Vol 12 ◽

Author(s):

José Ribamar Costa Ferreira-Neto ◽

Artemisa Nazaré da Costa Borges ◽

Manassés Daniel da Silva ◽

David Anderson de Lima Morais ◽

João Pacífico Bezerra-Neto ◽

...

Keyword(s):

Defense Mechanisms ◽

Structural Characteristics ◽

Purifying Selection ◽

Regulatory Elements ◽

Structural Features ◽

Wide Distribution ◽

Transcriptional Level ◽

Sequence Evolution ◽

Promoter Regions ◽

Almost All

The present work represents a pioneering effort, being the first to analyze genomic and transcriptomic data from Vigna unguiculata (cowpea) kinases. We evaluated the cowpea kinome considering its genome-wide distribution and structural characteristics (at the gene and protein levels), sequence evolution, conservation among Viridiplantae species, and gene expression in three cowpea genotypes under different stress situations, including biotic (injury followed by virus inoculation—CABMV or CPSMV) and abiotic (root dehydration). The structural features of cowpea kinases (VuPKs) indicated that 1,293 bona fide VuPKs covered 20 groups and 118 different families. The RLK-Pelle was the largest group, with 908 members. Insights on the mechanisms of VuPK genomic expansion and conservation among Viridiplantae species indicated dispersed and tandem duplications as major forces for VuPKs’ distribution pattern and high orthology indexes and synteny with other legume species, respectively. Ka/Ks ratios showed that almost all (91%) of the tandem duplication events were under purifying selection. Candidate cis-regulatory elements were associated with different transcription factors (TFs) in the promoter regions of the RLK-Pelle group. C2H2 TFs were closely associated with the promoter regions of almost all scrutinized families for the mentioned group. At the transcriptional level, it was suggested that VuPK up-regulation was stress, genotype, or tissue dependent (or a combination of them). The most prominent families in responding (up-regulation) to all the analyzed stresses were RLK-Pelle_DLSV and CAMK_CAMKL-CHK1. Concerning root dehydration, it was suggested that the up-regulated VuPKs are associated with ABA hormone signaling, auxin hormone transport, and potassium ion metabolism. Additionally, up-regulated VuPKs under root dehydration potentially assist in a critical physiological strategy of the studied cowpea genotype in this assay, with activation of defense mechanisms against biotic stress while responding to root dehydration. This study provides the foundation for further studies on the evolution and molecular function of VuPKs.

Download Full-text

Paleozoic Protein Fossils Illuminate the Evolution of Vertebrate Genomes and Transposable Elements

10.1101/2021.11.26.470093 ◽

2021 ◽

Author(s):

Martin C Frith

Keyword(s):

Transposable Elements ◽

Common Ancestor ◽

Regulatory Elements ◽

Regulatory Function ◽

Last Common Ancestor ◽

Protein Coding ◽

New Genes ◽

Gene Regulatory ◽

Host Genes ◽

The Way

Genomes hold a treasure trove of protein fossils: fragments of formerly protein-coding DNA, which mainly come from transposable elements (TEs) or host genes. These fossils reveal ancient evolution of TEs and genomes, and many fossils have been exapted to perform diverse functions important for the host's fitness. However, old and highly-degraded fossils are hard to identify, and standard methods (e.g. BLAST) are not optimized for this task. Here, a recently optimized method is used to find protein fossils in vertebrate genomes. It finds Paleozoic fossils predating the amphibian/amniote divergence from most major TE categories, including virus-related Polinton and Gypsy elements. It finds 10 fossils in the human genome (8 from TEs and 2 from host genes) that predate the last common ancestor of all jawed vertebrates, probably from the Ordovician period. It also finds types of transposon and retrotransposon not found in human before. These fossils have extreme sequence conservation, indicating exaptation: some have evidence of gene-regulatory function, and they tend to lie nearest to developmental genes. Some ancient fossils suggest "genome tectonics", where two fragments of one TE have drifted apart by up to megabases, possibly explaining gene deserts and large introns. This paints a picture of great TE diversity in our aquatic ancestors, with patchy TE inheritance by later vertebrates, producing new genes and regulatory elements on the way. Host-gene fossils too have contributed anciently-conserved DNA segments. This paves the way to further studies of ancient protein fossils.

Download Full-text

A codon model for associating phenotypic traits with altered selective patterns of sequence evolution

Systematic Biology ◽

10.1093/sysbio/syaa087 ◽

2020 ◽

Author(s):

Keren Halabi ◽

Eli Levy Karin ◽

Laurent Guéguen ◽

Itay Mayrose

Keyword(s):

Complex Traits ◽

Purifying Selection ◽

Phenotypic Traits ◽

Sequence Evolution ◽

Codon Model ◽

Protein Coding ◽

Coding Sequences ◽

Branch Site ◽

Signature Of Selection ◽

Bacterial Genes

Abstract Detecting the signature of selection in coding sequences and associating it with shifts in phenotypic states can unveil genes underlying complex traits. Of the various signatures of selection exhibited at the molecular level, changes in the pattern of selection at protein coding genes have been of main interest. To this end, phylogenetic branch-site codon models are routinely applied to detect changes in selective patterns along specific branches of the phylogeny. Many of these methods rely on a pre-specified partition of the phylogeny to branch categories, thus treating the course of trait evolution as fully resolved and assuming that phenotypic transitions have occurred only at speciation events. Here we present TraitRELAX, a new phylogenetic model that alleviates these strong assumptions by explicitly accounting for the uncertainty in the evolution of both trait and coding sequences. This joint statistical framework enables the detection of changes in selection intensity upon repeated trait transitions. We evaluated the performance of TraitRELAX using simulations and then applied it to two case studies. Using TraitRELAX, we found an intensification of selection in the primate SEMG2 gene in polygynandrous species compared to species of other mating forms, as well as changes in the intensity of purifying selection operating on sixteen bacterial genes upon transitioning from a free-living to an endosymbiotic lifestyle.

Download Full-text

Adaptive evolution of animal proteins over development: support for the Darwin selection opportunity hypothesis of Evo-Devo

10.1101/161711 ◽

2017 ◽

Cited By ~ 1

Author(s):

Jialin Liu ◽

Marc Robinson-Rechavi

Keyword(s):

Positive Selection ◽

Adaptive Evolution ◽

Morphological Diversity ◽

Evolutionary Divergence ◽

Late Development ◽

Developmental Constraints ◽

Protein Coding ◽

Development Support ◽

Evo Devo ◽

Cumulative Evidence

AbstractA driving hypothesis of Evo-Devo is that animal morphological diversity is shaped both by adaptation and by developmental constraints. Here we have tested Darwin’s “selection opportunity” hypothesis, according to which high evolutionary divergence in late development is due to strong positive selection. We contrasted it to a “developmental constraint” hypothesis, according to which late development is under relaxed negative selection. Indeed, the highest divergence between species, both at the morphological and molecular levels, is observed late in embryogenesis and post-embryonically. To distinguish between adaptation and relaxation hypotheses, we investigated the evidence of positive selection on protein-coding genes in relation to their expression over development, in fly Drosohila melanogaster, zebrafish Danio rerio, and mouse Mus musculus. First, we found that genes specifically expressed in late development have stronger signals of positive selection. Second, over the full transcriptome, genes with evidence for positive selection trend to be expressed in late development. Finally, genes involved in pathways with cumulative evidence of positive selection have higher expression in late development. Overall, there is a consistent signal that positive selection mainly affects genes and pathways expressed in late embryonic development and in adult. Our results imply that the evolution of embryogenesis is mostly conservative, with most adaptive evolution affecting some stages of post-embryonic gene expression, and thus post-embryonic phenotypes. This is consistent with the diversity of environmental challenges to which juveniles and adults are exposed.

Download Full-text

Faculty Opinions recommendation of Widespread purifying selection at polymorphic sites in human protein-coding loci.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1016701.201233 ◽

2003 ◽

Cited By ~ 1

Author(s):

Thomas Mitchell-Olds

Keyword(s):

Purifying Selection ◽

Human Protein ◽

Protein Coding

Download Full-text

Analysis of Stop Codons within Prokaryotic Protein-Coding Genes Suggests Frequent Readthrough Events

International Journal of Molecular Sciences ◽

10.3390/ijms22041876 ◽

2021 ◽

Vol 22 (4) ◽

pp. 1876

Author(s):

Frida Belinky ◽

Ishan Ganguly ◽

Eugenia Poliakov ◽

Vyacheslav Yurchenko ◽

Igor B. Rogozin

Keyword(s):

Stop Codon ◽

Purifying Selection ◽

Protein Product ◽

Intermediate Step ◽

Protein Coding ◽

Stop Codons ◽

Protein Coding Genes ◽

Synonymous Sites ◽

Prokaryotic Protein ◽

Sense Codon

Nonsense mutations turn a coding (sense) codon into an in-frame stop codon that is assumed to result in a truncated protein product. Thus, nonsense substitutions are the hallmark of pseudogenes and are used to identify them. Here we show that in-frame stop codons within bacterial protein-coding genes are widespread. Their evolutionary conservation suggests that many of them are not pseudogenes, since they maintain dN/dS values (ratios of substitution rates at non-synonymous and synonymous sites) significantly lower than 1 (this is a signature of purifying selection in protein-coding regions). We also found that double substitutions in codons—where an intermediate step is a nonsense substitution—show a higher rate of evolution compared to null models, indicating that a stop codon was introduced and then changed back to sense via positive selection. This further supports the notion that nonsense substitutions in bacteria are relatively common and do not necessarily cause pseudogenization. In-frame stop codons may be an important mechanism of regulation: Such codons are likely to cause a substantial decrease of protein expression levels.

Download Full-text