Readthrough errors purge deleterious cryptic sequences, facilitating the birth of coding sequences

Mapping Intimacies ◽

10.1101/737452 ◽

2019 ◽

Author(s):

Luke Kosinski ◽

Joanna Masel

Keyword(s):

Saccharomyces Cerevisiae ◽

De Novo ◽

Stop Codon ◽

Spillover Effects ◽

Structural Disorder ◽

Ribosome Profiling ◽

Protein Coding ◽

Coding Sequences ◽

Selection Hypothesis ◽

Low Level

AbstractDe novo protein-coding innovations sometimes emerge from ancestrally non-coding DNA, despite the expectation that translating random sequences is overwhelmingly likely to be deleterious. The “pre-adapting selection” hypothesis claims that emergence is facilitated by prior, low-level translation of non-coding sequences via molecular errors. It predicts that selection on polypeptides translated only in error is strong enough to matter, and is strongest when erroneous expression is high. To test this hypothesis, we examined non-coding sequences located downstream of stop codons (i.e. those potentially translated by readthrough errors) in Saccharomyces cerevisiae genes. We identified a class of “fragile” proteins under strong selection to reduce readthrough, which are unlikely substrates for co-option. Among the remainder, sequences showing evidence of readthrough translation, as assessed by ribosome profiling, encoded C-terminal extensions with higher intrinsic structural disorder, supporting the pre-adapting selection hypothesis. The cryptic sequences beyond the stop codon, rather than spillover effects from the regular C-termini, are primarily responsible for the higher disorder. Results are robust to controlling for the fact that stronger selection also reduces the length of C-terminal extensions. These findings indicate that selection acts on 3′ UTRs in S. cerevisiae to purge potentially deleterious variants of cryptic polypeptides, acting more strongly in genes that experience more readthrough errors.

Download Full-text

Readthrough Errors Purge Deleterious Cryptic Sequences, Facilitating the Birth of Coding Sequences

Molecular Biology and Evolution ◽

10.1093/molbev/msaa046 ◽

2020 ◽

Vol 37 (6) ◽

pp. 1761-1774 ◽

Cited By ~ 2

Author(s):

Luke J Kosinski ◽

Joanna Masel

Keyword(s):

Saccharomyces Cerevisiae ◽

De Novo ◽

Stop Codon ◽

Spillover Effects ◽

Structural Disorder ◽

Ribosome Profiling ◽

Noncoding Dna ◽

Protein Coding ◽

Selection Hypothesis ◽

Noncoding Sequences

Abstract De novo protein-coding innovations sometimes emerge from ancestrally noncoding DNA, despite the expectation that translating random sequences is overwhelmingly likely to be deleterious. The “preadapting selection” hypothesis claims that emergence is facilitated by prior, low-level translation of noncoding sequences via molecular errors. It predicts that selection on polypeptides translated only in error is strong enough to matter and is strongest when erroneous expression is high. To test this hypothesis, we examined noncoding sequences located downstream of stop codons (i.e., those potentially translated by readthrough errors) in Saccharomyces cerevisiae genes. We identified a class of “fragile” proteins under strong selection to reduce readthrough, which are unlikely substrates for co-option. Among the remainder, sequences showing evidence of readthrough translation, as assessed by ribosome profiling, encoded C-terminal extensions with higher intrinsic structural disorder, supporting the preadapting selection hypothesis. The cryptic sequences beyond the stop codon, rather than spillover effects from the regular C-termini, are primarily responsible for the higher disorder. Results are robust to controlling for the fact that stronger selection also reduces the length of C-terminal extensions. These findings indicate that selection acts on 3′ UTRs in Saccharomyces cerevisiae to purge potentially deleterious variants of cryptic polypeptides, acting more strongly in genes that experience more readthrough errors.

Download Full-text

A Depletion of Stop Codons in lincRNA is Owing to Transfer of Selective Constraint from Coding Sequences

Molecular Biology and Evolution ◽

10.1093/molbev/msz299 ◽

2019 ◽

Vol 37 (4) ◽

pp. 1148-1164

Author(s):

Liam Abrahams ◽

Laurence D Hurst

Keyword(s):

De Novo ◽

Stop Codon ◽

Noncoding Rnas ◽

Selective Constraint ◽

Protein Coding ◽

Coding Sequences ◽

Reading Frame ◽

Stop Codons ◽

Protein Coding Genes ◽

Exonic Splice

Abstract Although the constraints on a gene’s sequence are often assumed to reflect the functioning of that gene, here we propose transfer selection, a constraint operating on one class of genes transferred to another, mediated by shared binding factors. We show that such transfer can explain an otherwise paradoxical depletion of stop codons in long intergenic noncoding RNAs (lincRNAs). Serine/arginine-rich proteins direct the splicing machinery by binding exonic splice enhancers (ESEs) in immature mRNA. As coding exons cannot contain stop codons in one reading frame, stop codons should be rare within ESEs. We confirm that the stop codon density (SCD) in ESE motifs is low, even accounting for nucleotide biases. Given that serine/arginine-rich proteins binding ESEs also facilitate lincRNA splicing, a low SCD could transfer to lincRNAs. As predicted, multiexon lincRNA exons are depleted in stop codons, a result not explained by open reading frame (ORF) contamination. Consistent with transfer selection, stop codon depletion in lincRNAs is most acute in exonic regions with the highest ESE density, disappears when ESEs are masked, is consistent with stop codon usage skews in ESEs, and is diminished in both single-exon lincRNAs and introns. Owing to low SCD, the maximum lengths of pseudo-ORFs frequently exceed null expectations. This has implications for ORF annotation and the evolution of de novo protein-coding genes from lincRNAs. We conclude that not all constraints operating on genes need be explained by the functioning of the gene but may instead be transferred owing to shared binding factors.

Download Full-text

Draft Genome Sequence of Rheinheimera sp. F8, a Biofilm-Forming Strain Which Produces Large Amounts of Extracellular DNA

Genome Announcements ◽

10.1128/genomea.00082-16 ◽

2016 ◽

Vol 4 (2) ◽

Cited By ~ 3

Author(s):

Anna-Kathrin Schuster ◽

Ulrich Szewzyk

Keyword(s):

Genome Sequence ◽

De Novo ◽

Draft Genome ◽

Extracellular Dna ◽

Draft Genome Sequence ◽

Protein Coding ◽

Coding Sequences

Rheinheimera sp. strain F8 is a biofilm-forming gammaproteobacterium that has been found to produce large amounts of filamentous extracellular DNA. Here, we announce the de novo assembly of its genome. It is estimated to be 4,464,511 bp in length, with 3,970 protein-coding sequences and 92 RNA-coding sequences.

Download Full-text

A de novo evolved gene in the house mouse regulates female pregnancy cycles

eLife ◽

10.7554/elife.44392 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 4

Author(s):

Chen Xie ◽

Cemalettin Bekpen ◽

Sven Künzel ◽

Maryam Keshavarz ◽

Rebecca Krebs-Wheaton ◽

...

Keyword(s):

House Mouse ◽

De Novo ◽

Specific Protein ◽

Ribosome Profiling ◽

Mass Spectrometry Data ◽

Preimplantation Embryos ◽

Protein Coding ◽

Reading Frame ◽

Protein Coding Genes ◽

New Genes

The de novo emergence of new genes has been well documented through genomic analyses. However, a functional analysis, especially of very young protein-coding genes, is still largely lacking. Here, we identify a set of house mouse-specific protein-coding genes and assess their translation by ribosome profiling and mass spectrometry data. We functionally analyze one of them, Gm13030, which is specifically expressed in females in the oviduct. The interruption of the reading frame affects the transcriptional network in the oviducts at a specific stage of the estrous cycle. This includes the upregulation of Dcpp genes, which are known to stimulate the growth of preimplantation embryos. As a consequence, knockout females have their second litters after shorter times and have a higher infanticide rate. Given that Gm13030 shows no signs of positive selection, our findings support the hypothesis that a de novo evolved gene can directly adopt a function without much sequence adaptation.

Download Full-text

Improved annotation of protein-coding genes boundaries in metazoan mitochondrial genomes

Nucleic Acids Research ◽

10.1093/nar/gkz833 ◽

2019 ◽

Vol 47 (20) ◽

pp. 10543-10552 ◽

Cited By ~ 8

Author(s):

Alexander Donath ◽

Frank Jühling ◽

Marwa Al-Arab ◽

Stephan H Bernhart ◽

Franziska Reinhardt ◽

...

Keyword(s):

De Novo ◽

Stop Codon ◽

Difficult Problem ◽

Mitochondrial Genomes ◽

Protein Coding ◽

Protein Coding Genes ◽

Genetic Codes ◽

Annotation Server ◽

Codon Positions ◽

Mitochondrial Transcripts

Abstract With the rapid increase of sequenced metazoan mitochondrial genomes, a detailed manual annotation is becoming more and more infeasible. While it is easy to identify the approximate location of protein-coding genes within mitogenomes, the peculiar processing of mitochondrial transcripts, however, makes the determination of precise gene boundaries a surprisingly difficult problem. We have analyzed the properties of annotated start and stop codon positions in detail, and use the inferred patterns to devise a new method for predicting gene boundaries in de novo annotations. Our method benefits from empirically observed prevalances of start/stop codons and gene lengths, and considers the dependence of these features on variations of genetic codes. Albeit not being perfect, our new approach yields a drastic improvement in the accuracy of gene boundaries and upgrades the mitochondrial genome annotation server MITOS to an even more sophisticated tool for fully automatic annotation of metazoan mitochondrial genomes.

Download Full-text

De Novo Origination of a New Protein-Coding Gene in Saccharomyces cerevisiae

Genetics ◽

10.1534/genetics.107.084491 ◽

2008 ◽

Vol 179 (1) ◽

pp. 487-496 ◽

Cited By ~ 132

Author(s):

Jing Cai ◽

Ruoping Zhao ◽

Huifeng Jiang ◽

Wen Wang

Keyword(s):

Saccharomyces Cerevisiae ◽

De Novo ◽

Protein Coding ◽

New Protein

Download Full-text

NOVEL INTRONIC NON-CODING RNAS CONTRIBUTE TO MAINTENANCE OF PHENOTYPE IN SACCHAROMYCES CEREVISIAE

10.1101/033076 ◽

2015 ◽

Author(s):

Katarzyna B Hooks ◽

Samina Naseeb ◽

Sam Griffiths-Jones ◽

Daniela Delneri

Keyword(s):

Saccharomyces Cerevisiae ◽

Rna Structure ◽

Structure Prediction ◽

De Novo ◽

Intron Retention ◽

Intron Loss ◽

Common Belief ◽

Rna Structures ◽

Protein Coding ◽

Non Coding Rna

The Saccharomyces cerevisiae genome has undergone extensive intron loss during its evolutionary history. It has been suggested that the few remaining introns (in only 5% of protein-coding genes) are retained because of their impact on function under stress conditions. Here, we explore the possibility that novel non-coding RNA structures (ncRNAs) are embedded within intronic sequences and are contributing to phenotype and intron retention in yeast. We employed de novo RNA structure prediction tools to screen intronic sequences in S. cerevisiae and 36 other fungi. We identified and validated 19 new intronic RNAs via RNAseq and RT-PCR. Contrary to common belief that excised introns are rapidly degraded, we found that, in six cases, the excised introns were maintained intact in the cells. In other two cases we showed that the ncRNAs were further processed from their introns. RNAseq analysis confirmed higher expression of introns in the ribosomial protein genes containing predicted RNA structures. We deleted the novel intronic RNA structure within the GLC7 intron and showed that this predicted ncRNA, rather than the intron itself, is responsible for the cell???s ability to respond to salt stress. We also showed a direct association between the presence of the intronic ncRNA and GLC7 expression. Overall, these data support the notion that some introns may have been maintained in the genome because they harbour functional ncRNAs.

Download Full-text

A spectral analysis approach to detect actively translated open reading frames in high-resolution ribosome profiling data

10.1101/031625 ◽

2015 ◽

Author(s):

Lorenzo Calviello ◽

Neelanjan Mukherjee ◽

Emanuel Wyler ◽

Henrik Zauber ◽

Antje Hirsekorn ◽

...

Keyword(s):

Spectral Analysis ◽

Gene Expression Regulation ◽

De Novo ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Mass Spectrometry Data ◽

Hek293 Cells ◽

Protein Coding ◽

Reading Frame ◽

Reading Frames

RNA sequencing protocols allow for quantifying gene expression regulation at each individual step, from transcription to protein synthesis. Ribosome Profiling (Ribo-seq) maps the positions of translating ribosomes over the entire transcriptome. Despite its great potential, a rigorous statistical approach to identify translated regions by means of the characteristic three-nucleotide periodicity of Ribo-seq data is not yet available. To fill this gap, we developed RiboTaper, which quantifies the significance of periodic Ribo-seq reads via spectral analysis methods. We applied RiboTaper on newly generated, deep Ribo-seq data in HEK293 cells, to derive an extensive map of translation that covers Open Reading Frame (ORF) annotations for more than 11,000 protein- coding genes. We also find distinct ribosomal signatures for several hundred detected upstream ORFs and ORFs in annotated non-coding genes (ncORFs). Mass spectrometry data confirms that RiboTaper achieves excellent coverage of the cellular proteome and validates dozens of novel peptide products. Collectively, RiboTaper (available at https://ohlerlab.mdc-berlin.de/software/ ) is a powerful method for comprehensive de novo identification of actively used ORFs in the human genome.

Download Full-text

Rapid Prototyping Platform for Saccharomyces cerevisiae Using Computer-Aided Genetic Design Enabled by Parallel Software and Workcell Platform Development

SLAS TECHNOLOGY Translating Life Sciences Innovation ◽

10.1177/2472630318798304 ◽

2018 ◽

Vol 24 (3) ◽

pp. 291-297 ◽

Cited By ~ 4

Author(s):

P. D. Rajakumar ◽

G-O. F. Gowers ◽

L. Suckling ◽

A. Foster ◽

T. Ellis ◽

...

Keyword(s):

Saccharomyces Cerevisiae ◽

Computer Aided Design ◽

Software Tool ◽

Protein Coding ◽

Coding Sequences ◽

Liquid Handling ◽

Design Cycle ◽

Computer Aided ◽

Aided Design ◽

Genetic Constructs

Biofoundries have enabled the ability to automate the construction of genetic constructs using computer-aided design. In this study, we have developed the methodology required to abstract and automate the construction of yeast-compatible designs. We demonstrate the use of our in-house software tool, AMOS, to coordinate with design software, JMP, and robotic liquid handling platforms to successfully manage the construction of a library of 88 yeast expression plasmids. In this proof-of-principle study, we used three fluorescent genes as proxy for three enzyme coding sequences. Our platform has been designed to quickly iterate around a design cycle of four protein coding sequences per plasmid, with larger numbers possible with multiplexed genome integrations in Saccharomyces cerevisiae. This work highlights how developing scalable new biotechnology applications requires a close integration between software development, liquid handling robotics, and protocol development.

Download Full-text

De novoemergence of adaptive membrane proteins from thymine-rich intergenic sequences

10.1101/621532 ◽

2019 ◽

Author(s):

Nikolaos Vakirlis ◽

Omer Acar ◽

Brian Hsu ◽

Nelson Castilho Coelho ◽

S. Branden Van Oss ◽

...

Keyword(s):

De Novo ◽

Transmembrane Proteins ◽

Protein Coding ◽

Coding Sequences ◽

Beneficial Effects ◽

Protein Coding Genes ◽

Evolutionary Innovation ◽

Intergenic Sequences ◽

Intergenic Regions ◽

Novel Protein

SummaryRecent evidence demonstrates that novel protein-coding genes can arisede novofrom intergenic loci. This evolutionary innovation is thought to be facilitated by the pervasive translation of intergenic transcripts, which exposes a reservoir of variable polypeptides to natural selection. Do intergenic translation events yield polypeptides with useful biochemical capacities? The answer to this question remains controversial. Here, we systematically characterized howde novoemerging coding sequences impact fitness. In budding yeast, overexpression of these sequences was enriched in beneficial effects, while their disruption was generally inconsequential. We found that beneficial emerging sequences have a strong tendency to encode putative transmembrane proteins, which appears to stem from a cryptic propensity for transmembrane signals throughout thymine-rich intergenic regions of the genome. These findings suggest that novel genes with useful biochemical capacities, such as transmembrane domains, tend to evolvede novowithin intergenic loci that already harbored a blueprint for these capacities.

Download Full-text