A codon model for associating phenotypic traits with altered selective patterns of sequence evolution

Systematic Biology ◽

10.1093/sysbio/syaa087 ◽

2020 ◽

Author(s):

Keren Halabi ◽

Eli Levy Karin ◽

Laurent Guéguen ◽

Itay Mayrose

Keyword(s):

Complex Traits ◽

Purifying Selection ◽

Phenotypic Traits ◽

Sequence Evolution ◽

Codon Model ◽

Protein Coding ◽

Coding Sequences ◽

Branch Site ◽

Signature Of Selection ◽

Bacterial Genes

Abstract Detecting the signature of selection in coding sequences and associating it with shifts in phenotypic states can unveil genes underlying complex traits. Of the various signatures of selection exhibited at the molecular level, changes in the pattern of selection at protein coding genes have been of main interest. To this end, phylogenetic branch-site codon models are routinely applied to detect changes in selective patterns along specific branches of the phylogeny. Many of these methods rely on a pre-specified partition of the phylogeny to branch categories, thus treating the course of trait evolution as fully resolved and assuming that phenotypic transitions have occurred only at speciation events. Here we present TraitRELAX, a new phylogenetic model that alleviates these strong assumptions by explicitly accounting for the uncertainty in the evolution of both trait and coding sequences. This joint statistical framework enables the detection of changes in selection intensity upon repeated trait transitions. We evaluated the performance of TraitRELAX using simulations and then applied it to two case studies. Using TraitRELAX, we found an intensification of selection in the primate SEMG2 gene in polygynandrous species compared to species of other mating forms, as well as changes in the intensity of purifying selection operating on sixteen bacterial genes upon transitioning from a free-living to an endosymbiotic lifestyle.

Download Full-text

A codon model for associating phenotypic traits with altered selective patterns of sequence evolution

10.1101/2020.03.04.974584 ◽

2020 ◽

Author(s):

Keren Halabi ◽

Eli Levy Karin ◽

Laurent Guéguen ◽

Itay Mayrose

Keyword(s):

Complex Traits ◽

Purifying Selection ◽

Phenotypic Traits ◽

Sequence Evolution ◽

Loss Of Function ◽

Codon Model ◽

Coding Sequences ◽

Genomic Changes ◽

Signature Of Selection ◽

Bacterial Genes

AbstractChanges in complex phenotypes, such as pathogenicity levels, trophic lifestyle, and habitat shifts are brought on by multiple genomic changes: sub- and neofunctionalization, loss of function, and levels of gene expression. Thus, detecting the signature of selection in coding sequences and associating it with shifts in phenotypic state can unveil the genes underlying complex traits. Phylogenetic branch-site codon models are routinely applied to detect changes in selective patterns along specific branches of the phylogeny. These methods rely on a pre-specified partition of the phylogeny to branch categories, thus treating the course of trait evolution as fully resolved and assuming that transitions in phenotypic states have occurred only at speciation events. Here we present TraitRELAX, a new phylogenetic model that alleviates these strong assumptions by explicitly accounting for the uncertainty in the evolution of both trait and coding sequences. This joint statistical framework enables the detection of changes in selection intensity upon repeated trait transitions. We evaluated the performance of TraitRELAX using simulations and then applied it to two case studies. Using TraitRELAX, we found an intensification of selection in the SEMG2 gene in polygynandrous species of primates compared to species of other mating forms, as well as changes in the intensity of purifying selection operating on sixteen bacterial genes upon transitioning from free-living to an endosymbiotic lifestyle.

Download Full-text

Developmental constraints on genome evolution in four bilaterian model species

10.1101/161679 ◽

2017 ◽

Author(s):

Jialin Liu ◽

Marc Robinson-Rechavi

Keyword(s):

Genome Evolution ◽

Purifying Selection ◽

Regulatory Elements ◽

Sequence Evolution ◽

Late Development ◽

Developmental Constraints ◽

Protein Coding ◽

New Genes ◽

Hourglass Model ◽

Conservation Model

AbstractDevelopmental constraints on genome evolution have been suggested to follow either an early conservation model or an “hourglass” model. Both models agree that late development strongly diverges between species, but debate on which developmental period is the most conserved. Here, based on a modified “Transcriptome Age Index” approach, i.e. weighting trait measures by expression level, we analyzed the constraints acting on three evolutionary traits of protein coding genes (strength of purifying selection on protein sequences, phyletic age, and duplicability) in four species: nematode worm Caenorhabditis elegans, fly Drosophila melanogaster, zebrafish Danio rerio, and mouse Mus musculus. In general, we found that both models can be supported by different genomic properties. Sequence evolution follows an hourglass model, but the evolution of phyletic age and of duplicability follow an early conservation model. Further analyses indicate that stronger purifying selection on sequences in the middle development are driven by temporal pleiotropy of these genes. In addition, we report evidence that expression in late development is enriched with retrogenes, which usually lack efficient regulatory elements. This implies that expression in late development could facilitate transcription of new genes, and provide opportunities for acquisition of function. Finally, in C. elegans, we suggest that dosage imbalance could be one of the main factors that cause depleted expression of high duplicability genes in early development.

Download Full-text

Are Nonsynonymous Transversions Generally More Deleterious than Nonsynonymous Transitions?

Molecular Biology and Evolution ◽

10.1093/molbev/msaa200 ◽

2020 ◽

Vol 38 (1) ◽

pp. 181-191

Author(s):

Zhengting Zou ◽

Jianzhi Zhang

Keyword(s):

Amino Acid ◽

Dna Sequences ◽

Sequence Evolution ◽

Codon Model ◽

Protein Coding ◽

Fitness Effects ◽

Genome Wide ◽

Species Pairs ◽

Species Specific ◽

Evolutionary Lineages

Abstract It has been suggested that, due to the structure of the genetic code, nonsynonymous transitions are less likely than transversions to cause radical changes in amino acid physicochemical properties so are on average less deleterious. This view was supported by some but not all mutagenesis experiments. Because laboratory measures of fitness effects have limited sensitivities and relative frequencies of different mutations in mutagenesis studies may not match those in nature, we here revisit this issue using comparative genomics. We extend the standard codon model of sequence evolution by adding the parameter η that quantifies the ratio of the fixation probability of transitional nonsynonymous mutations to that of transversional nonsynonymous mutations. We then estimate η from the concatenated alignment of all protein-coding DNA sequences of two closely related genomes. Surprisingly, η ranges from 0.13 to 2.0 across 90 species pairs sampled from the tree of life, with 51 incidences of η < 1 and 30 incidences of η >1 that are statistically significant. Hence, whether nonsynonymous transversions are overall more deleterious than nonsynonymous transitions is species-dependent. Because the corresponding groups of amino acid replacements differ between nonsynonymous transitions and transversions, η is influenced by the relative exchangeabilities of amino acid pairs. Indeed, an extensive search reveals that the large variation in η is primarily explainable by the recently reported among-species disparity in amino acid exchangeabilities. These findings demonstrate that genome-wide nucleotide substitution patterns in coding sequences have species-specific features and are more variable among evolutionary lineages than are currently thought.

Download Full-text

How do we transition from non-coding to coding?

10.7287/peerj.preprints.3031v1 ◽

2017 ◽

Author(s):

Jorge Ruiz-Orera ◽

José Luis Villanueva-Cañas ◽

William Blevins ◽

M.Mar Albà

Keyword(s):

De Novo ◽

Gene Evolution ◽

Purifying Selection ◽

Neutral Evolution ◽

Functional Protein ◽

Protein Coding ◽

Coding Sequences ◽

Sequence Composition ◽

Protein Coding Genes ◽

Small Proteins

Recent years have witnessed the discovery of protein–coding genes which appear to have evolved de novo from previously non-coding sequences. This has changed the long-standing view that coding sequences can only evolve from other coding sequences. However, there are still many open questions regarding how new protein-coding sequences can arise from non-genic DNA. Two prerequisites for the birth of a new functional protein-coding gene are that the corresponding DNA fragment is transcribed and that it is also translated. Transcription is known to be pervasive in the genome, producing a large number of transcripts that do not correspond to conserved protein-coding genes, and which are usually annotated as long non-coding RNAs (lncRNA). Recently, sequencing of ribosome protected fragments (Ribo-Seq) has provided evidence that many of these transcripts actually translate small proteins. We have used mouse non-synonymous and synonymous variation data to estimate the strength of purifying selection acting on the translated open reading frames (ORFs). Whereas a subset of the lncRNAs are likely to actually be true protein-coding genes (and thus previously misclassified), the bulk of lncRNAs code for proteins which show variation patterns consistent with neutral evolution. We also show that the ORFs that have a more favorable, coding-like, sequence composition are more likely to be translated than other ORFs in lncRNAs. This study provides strong evidence that there is a large and ever-changing reservoir of lowly abundant proteins; some of these peptides may become useful and act as seeds for de novo gene evolution.

Download Full-text

Rapid protein sequence evolution via compensatory frameshift is widespread in RNA virus genomes

BMC Bioinformatics ◽

10.1186/s12859-021-04182-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Dongbin Park ◽

Yoonsoo Hahn

Keyword(s):

Amino Acid ◽

Large Scale ◽

Rna Viruses ◽

Rna Virus ◽

Phylogenetic Analyses ◽

Sequence Evolution ◽

Protein Coding ◽

Coding Sequences ◽

Reading Frame ◽

Nucleotide Insertions

Abstract Background RNA viruses possess remarkable evolutionary versatility driven by the high mutability of their genomes. Frameshifting nucleotide insertions or deletions (indels), which cause the premature termination of proteins, are frequently observed in the coding sequences of various viral genomes. When a secondary indel occurs near the primary indel site, the open reading frame can be restored to produce functional proteins, a phenomenon known as the compensatory frameshift. Results In this study, we systematically analyzed publicly available viral genome sequences and identified compensatory frameshift events in hundreds of viral protein-coding sequences. Compensatory frameshift events resulted in large-scale amino acid differences between the compensatory frameshift form and the wild type even though their nucleotide sequences were almost identical. Phylogenetic analyses revealed that the evolutionary distance between proteins with and without a compensatory frameshift were significantly overestimated because amino acid mismatches caused by compensatory frameshifts were counted as substitutions. Further, this could cause compensatory frameshift forms to branch in different locations in the protein and nucleotide trees, which may obscure the correct interpretation of phylogenetic relationships between variant viruses. Conclusions Our results imply that the compensatory frameshift is one of the mechanisms driving the rapid protein evolution of RNA viruses and potentially assisting their host-range expansion and adaptation.

Download Full-text

The relationship between base composition and codon usage in bacterial genes and its use for the simple and reliable identification of protein-coding sequences

Gene ◽

10.1016/0378-1119(84)90116-1 ◽

1984 ◽

Vol 30 (1-3) ◽

pp. 157-166 ◽

Cited By ~ 486

Author(s):

M.J. Bibb ◽

P.R. Findlay ◽

M.W. Johnson

Keyword(s):

Codon Usage ◽

Base Composition ◽

Protein Coding ◽

Coding Sequences ◽

Reliable Identification ◽

Bacterial Genes ◽

The Relationship

Download Full-text

Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes

Science ◽

10.1126/science.1219240 ◽

2012 ◽

Vol 337 (6090) ◽

pp. 64-69 ◽

Cited By ~ 1186

Author(s):

Jacob A. Tennessen ◽

Abigail W. Bigham ◽

Timothy D. O’Connor ◽

Wenqing Fu ◽

Eimear E. Kenny ◽

...

Keyword(s):

Protein Function ◽

Complex Traits ◽

Rare Variants ◽

Purifying Selection ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Protein Coding ◽

Protein Coding Genes ◽

Functional Variants ◽

A Minor

As a first step toward understanding how rare variants contribute to risk for complex diseases, we sequenced 15,585 human protein-coding genes to an average median depth of 111× in 2440 individuals of European (n = 1351) and African (n = 1088) ancestry. We identified over 500,000 single-nucleotide variants (SNVs), the majority of which were rare (86% with a minor allele frequency less than 0.5%), previously unknown (82%), and population-specific (82%). On average, 2.3% of the 13,595 SNVs each person carried were predicted to affect protein function of ~313 genes per genome, and ~95.7% of SNVs predicted to be functionally important were rare. This excess of rare functional variants is due to the combined effects of explosive, recent accelerated population growth and weak purifying selection. Furthermore, we show that large sample sizes will be required to associate rare variants with complex traits.

Download Full-text

Faculty Opinions recommendation of Widespread purifying selection at polymorphic sites in human protein-coding loci.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1016701.201233 ◽

2003 ◽

Cited By ~ 1

Author(s):

Thomas Mitchell-Olds

Keyword(s):

Purifying Selection ◽

Human Protein ◽

Protein Coding

Download Full-text

Faculty Opinions recommendation of Role of low-complexity sequences in the formation of novel protein coding sequences.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718030532.793494763 ◽

2014 ◽

Author(s):

Erich Bornberg-Bauer ◽

Magdalena Heberlein

Keyword(s):

Low Complexity ◽

Protein Coding ◽

Coding Sequences ◽

Novel Protein

Download Full-text

Draft Genome Sequence of Urease-Producing Pseudorhodobacter sp. Strain E13, Isolated from the Yellow Sea in Gunsan, South Korea

Microbiology Resource Announcements ◽

10.1128/mra.00189-19 ◽

2019 ◽

Vol 8 (23) ◽

Author(s):

Si Chul Kim ◽

Hyo Jung Lee

Keyword(s):

South Korea ◽

Genome Sequence ◽

Yellow Sea ◽

Draft Genome ◽

The Yellow Sea ◽

Draft Genome Sequence ◽

Protein Coding ◽

Coding Sequences ◽

Gram Negative ◽

Content Type

Here, we report the draft genome sequence of Pseudorhodobacter sp. strain E13, a Gram-negative, aerobic, nonflagellated, and rod-shaped bacterium which was isolated from the Yellow Sea in South Korea. The assembled genome sequence is 3,878,578 bp long with 3,646 protein-coding sequences in 159 contigs.

Download Full-text