Analysis of Stop Codons within Prokaryotic Protein-Coding Genes Suggests Frequent Readthrough Events

Frida Belinky; Ishan Ganguly; Eugenia Poliakov; Vyacheslav Yurchenko; Igor B. Rogozin

doi:10.3390/ijms22041876

Analysis of Stop Codons within Prokaryotic Protein-Coding Genes Suggests Frequent Readthrough Events

International Journal of Molecular Sciences ◽

10.3390/ijms22041876 ◽

2021 ◽

Vol 22 (4) ◽

pp. 1876

Author(s):

Frida Belinky ◽

Ishan Ganguly ◽

Eugenia Poliakov ◽

Vyacheslav Yurchenko ◽

Igor B. Rogozin

Keyword(s):

Stop Codon ◽

Purifying Selection ◽

Protein Product ◽

Intermediate Step ◽

Protein Coding ◽

Stop Codons ◽

Protein Coding Genes ◽

Synonymous Sites ◽

Prokaryotic Protein ◽

Sense Codon

Nonsense mutations turn a coding (sense) codon into an in-frame stop codon that is assumed to result in a truncated protein product. Thus, nonsense substitutions are the hallmark of pseudogenes and are used to identify them. Here we show that in-frame stop codons within bacterial protein-coding genes are widespread. Their evolutionary conservation suggests that many of them are not pseudogenes, since they maintain dN/dS values (ratios of substitution rates at non-synonymous and synonymous sites) significantly lower than 1 (this is a signature of purifying selection in protein-coding regions). We also found that double substitutions in codons—where an intermediate step is a nonsense substitution—show a higher rate of evolution compared to null models, indicating that a stop codon was introduced and then changed back to sense via positive selection. This further supports the notion that nonsense substitutions in bacteria are relatively common and do not necessarily cause pseudogenization. In-frame stop codons may be an important mechanism of regulation: Such codons are likely to cause a substantial decrease of protein expression levels.

Download Full-text

A Depletion of Stop Codons in lincRNA is Owing to Transfer of Selective Constraint from Coding Sequences

Molecular Biology and Evolution ◽

10.1093/molbev/msz299 ◽

2019 ◽

Vol 37 (4) ◽

pp. 1148-1164

Author(s):

Liam Abrahams ◽

Laurence D Hurst

Keyword(s):

De Novo ◽

Stop Codon ◽

Noncoding Rnas ◽

Selective Constraint ◽

Protein Coding ◽

Coding Sequences ◽

Reading Frame ◽

Stop Codons ◽

Protein Coding Genes ◽

Exonic Splice

Abstract Although the constraints on a gene’s sequence are often assumed to reflect the functioning of that gene, here we propose transfer selection, a constraint operating on one class of genes transferred to another, mediated by shared binding factors. We show that such transfer can explain an otherwise paradoxical depletion of stop codons in long intergenic noncoding RNAs (lincRNAs). Serine/arginine-rich proteins direct the splicing machinery by binding exonic splice enhancers (ESEs) in immature mRNA. As coding exons cannot contain stop codons in one reading frame, stop codons should be rare within ESEs. We confirm that the stop codon density (SCD) in ESE motifs is low, even accounting for nucleotide biases. Given that serine/arginine-rich proteins binding ESEs also facilitate lincRNA splicing, a low SCD could transfer to lincRNAs. As predicted, multiexon lincRNA exons are depleted in stop codons, a result not explained by open reading frame (ORF) contamination. Consistent with transfer selection, stop codon depletion in lincRNAs is most acute in exonic regions with the highest ESE density, disappears when ESEs are masked, is consistent with stop codon usage skews in ESEs, and is diminished in both single-exon lincRNAs and introns. Owing to low SCD, the maximum lengths of pseudo-ORFs frequently exceed null expectations. This has implications for ORF annotation and the evolution of de novo protein-coding genes from lincRNAs. We conclude that not all constraints operating on genes need be explained by the functioning of the gene but may instead be transferred owing to shared binding factors.

Download Full-text

Complete overview of protein-inactivating sequence variations in 36 sequenced mouse inbred strains

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1706168114 ◽

2017 ◽

Vol 114 (34) ◽

pp. 9158-9163 ◽

Cited By ~ 14

Author(s):

Steven Timmermans ◽

Marc Van Montagu ◽

Claude Libert

Keyword(s):

Mouse Genome ◽

Inbred Strains ◽

Nucleotide Polymorphisms ◽

Genome Sequences ◽

Single Nucleotide ◽

Protein Coding ◽

Stop Codons ◽

Protein Coding Genes ◽

Sequence Variations ◽

Genetic Background Effects

Mouse inbred strains remain essential in science. We have analyzed the publicly available genome sequences of 36 popular inbred strains and provide lists for each strain of protein-coding genes that acquired sequence variations that cause premature STOP codons, loss of STOP codons and single nucleotide polymorphisms, and short in-frame insertions and deletions. Our data give an overview of predicted defective proteins, including predicted impact scores, of all these strains compared with the reference mouse genome of C57BL/6J. These data can also be retrieved via a searchable website (mousepost.be) and allow a global, better interpretation of genetic background effects and a source of naturally defective alleles in these 36 sequenced classical and high-priority mouse inbred strains.

Download Full-text

Complete mitochondrial genomes of five raptors and implications for the phylogenetic relationships between owls and nightjars

10.7287/peerj.preprints.27478v1 ◽

2019 ◽

Author(s):

Gang Liu ◽

Lizhi Zhou ◽

Guanghong Zhao

Keyword(s):

Control Region ◽

Phylogenetic Relationships ◽

Phylogenetic Trees ◽

Stop Codon ◽

Mitochondrial Genomes ◽

Closely Related Species ◽

Protein Coding ◽

Protein Coding Genes ◽

Independent Families ◽

Complete Mitochondrial Genomes

The phylogenetic relationships between owls and nightjars are rather complex and controversial. To clarify these relationships, we determined the complete mitochondrial genomes of Glaucidium cuculoides, Otus scops, Glaucidium brodiei, Caprimulgus indicus, and Strix leptogrammica, and estimated phylogenetic trees based on the complete mitochondrial genomes and aligned sequences from closely related species that were obtained in GenBank. The complete mitochondrial genomes were 17392, 17317, 17549, 17536, and 16307 bp in length. All mitochondrial genomes contained 13 protein-coding genes, two rRNAs, 22 tRNAs, and a putative control region. All mitochondrial genomes except for that of Strix leptogrammica contained a pseudo-control region. ATG, GTG, and ATA are generally start codons, whereas TAA is the most frequent stop codon. All tRNAs in the new mtDNAs could be folded into canonical cloverleaf secondary structures except for tRNASer (AGY) and tRNALeu (CUN) , which missing the “DHU” arm. The phylogenetic relationships demonstrated that Strigiformes and Caprimulgiformes are independent orders, and Aegothelidae is a family within Caprimulgiformes. The results also revealed that Accipitriformes is an independent order, and Pandionidae and Sagittariidae are independent families. The results also supported that Apodiformes is polyphyletic, and hummingbirds (family Trochilidae) belong to Apodiformes. Piciformes was most distantly related to all other analyzed orders.

Download Full-text

Improved annotation of protein-coding genes boundaries in metazoan mitochondrial genomes

Nucleic Acids Research ◽

10.1093/nar/gkz833 ◽

2019 ◽

Vol 47 (20) ◽

pp. 10543-10552 ◽

Cited By ~ 8

Author(s):

Alexander Donath ◽

Frank Jühling ◽

Marwa Al-Arab ◽

Stephan H Bernhart ◽

Franziska Reinhardt ◽

...

Keyword(s):

De Novo ◽

Stop Codon ◽

Difficult Problem ◽

Mitochondrial Genomes ◽

Protein Coding ◽

Protein Coding Genes ◽

Genetic Codes ◽

Annotation Server ◽

Codon Positions ◽

Mitochondrial Transcripts

Abstract With the rapid increase of sequenced metazoan mitochondrial genomes, a detailed manual annotation is becoming more and more infeasible. While it is easy to identify the approximate location of protein-coding genes within mitogenomes, the peculiar processing of mitochondrial transcripts, however, makes the determination of precise gene boundaries a surprisingly difficult problem. We have analyzed the properties of annotated start and stop codon positions in detail, and use the inferred patterns to devise a new method for predicting gene boundaries in de novo annotations. Our method benefits from empirically observed prevalances of start/stop codons and gene lengths, and considers the dependence of these features on variations of genetic codes. Albeit not being perfect, our new approach yields a drastic improvement in the accuracy of gene boundaries and upgrades the mitochondrial genome annotation server MITOS to an even more sophisticated tool for fully automatic annotation of metazoan mitochondrial genomes.

Download Full-text

Protein-Coding Genes of Helicobacter pylori Predominantly Present Purifying Selection though Many Membrane Proteins Suffer from Selection Pressure: A Proposal to Analyze Bacterial Pangenomes

Genes ◽

10.3390/genes12030377 ◽

2021 ◽

Vol 12 (3) ◽

pp. 377

Author(s):

Alejandro Rubio ◽

Antonio Pérez-Pulido

Keyword(s):

Helicobacter Pylori ◽

Membrane Proteins ◽

Selection Pressure ◽

Purifying Selection ◽

Protein Coding ◽

Evolutionary Selection ◽

Protein Coding Genes ◽

The Core ◽

Genes Encoding ◽

Selection For

The current availability of complete genome sequences has allowed knowing that bacterial genomes can bear genes not present in the genome of all the strains from a specific species. So, the genes shared by all the strains comprise the core of the species, but the pangenome can be much greater and usually includes genes appearing in one only strain. Once the pangenome of a species is estimated, other studies can be undertaken to generate new knowledge, such as the study of the evolutionary selection for protein-coding genes. Most of the genes of a pangenome are expected to be subject to purifying selection that assures the conservation of function, especially those in the core group. However, some genes can be subject to selection pressure, such as genes involved in virulence that need to escape to the host immune system, which is more common in the accessory group of the pangenome. We analyzed 180 strains of Helicobacter pylori, a bacterium that colonizes the gastric mucosa of half the world population and presents a low number of genes (around 1500 in a strain and 3000 in the pangenome). After the estimation of the pangenome, the evolutionary selection for each gene has been calculated, and we found that 85% of them are subject to purifying selection and the remaining genes present some grade of selection pressure. As expected, the latter group is enriched with genes encoding for membrane proteins putatively involved in interaction to host tissues. In addition, this group also presents a high number of uncharacterized genes and genes encoding for putative spurious proteins. It suggests that they could be false positives from the gene finders used for identifying them. All these results propose that this kind of analyses can be useful to validate gene predictions and functionally characterize proteins in complete genomes.

Download Full-text

The landscape of somatic mutations in protein coding genes in apparently benign human tissues carries signatures of relaxed purifying selection

Nucleic Acids Research ◽

10.1093/nar/gkw086 ◽

2016 ◽

Vol 44 (5) ◽

pp. 2075-2084 ◽

Cited By ~ 39

Author(s):

Vinod Kumar Yadav ◽

James DeGregori ◽

Subhajyoti De

Keyword(s):

Somatic Mutations ◽

Purifying Selection ◽

Human Tissues ◽

Protein Coding ◽

Protein Coding Genes

Download Full-text

Repurposing tRNAs for nonsense suppression

Nature Communications ◽

10.1038/s41467-021-24076-x ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Suki Albers ◽

Bertrand Beckert ◽

Marco C. Matthies ◽

Chandra Sekhar Mandava ◽

Raphael Schuster ◽

...

Keyword(s):

De Novo ◽

Stop Codon ◽

Premature Stop Codon ◽

Nonsense Suppression ◽

Potent Effect ◽

Systematic Analysis ◽

Stop Codons ◽

Release Factors ◽

A Site ◽

Sense Codon

AbstractThree stop codons (UAA, UAG and UGA) terminate protein synthesis and are almost exclusively recognized by release factors. Here, we design de novo transfer RNAs (tRNAs) that efficiently decode UGA stop codons in Escherichia coli. The tRNA designs harness various functionally conserved aspects of sense-codon decoding tRNAs. Optimization within the TΨC-stem to stabilize binding to the elongation factor, displays the most potent effect in enhancing suppression activity. We determine the structure of the ribosome in a complex with the designed tRNA bound to a UGA stop codon in the A site at 2.9 Å resolution. In the context of the suppressor tRNA, the conformation of the UGA codon resembles that of a sense-codon rather than when canonical translation termination release factors are bound, suggesting conformational flexibility of the stop codons dependent on the nature of the A-site ligand. The systematic analysis, combined with structural insights, provides a rationale for targeted repurposing of tRNAs to correct devastating nonsense mutations that introduce a premature stop codon.

Download Full-text

Extreme purifying selection against point mutations in the human genome

10.1101/2021.08.23.457339 ◽

2021 ◽

Author(s):

Noah Dukler ◽

Mehreen R Mughal ◽

Ritika Ramani ◽

Yi-Fei Huang ◽

Adam Siepel

Keyword(s):

Human Genome ◽

De Novo ◽

Point Mutations ◽

Purifying Selection ◽

Selection Coefficient ◽

Sequencing Data ◽

Protein Coding ◽

Coding Regions ◽

Protein Coding Genes ◽

Selective Effects

Genome sequencing of tens of thousands of human individuals has recently enabled the measurement of large selective effects for mutations to protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, for measuring similar selective effects at individual sites in noncoding as well as in coding regions of the human genome. ExtRaINSIGHT estimates the prevalance of strong purifying selection, or "ultraselection" (λs), as the fractional depletion of rare single-nucleotide variants (minor allele frequency <0.1%) in a target set of genomic sites relative to matched sites that are putatively neutrally evolving, in a manner that controls for local variation and neighbor-dependence in mutation rate. We show using simulations that, above an appropriate threshold, λs is closely related to the average site-specific selection coefficient against heterozygous point mutations, as predicted at mutation-selection balance. Applying ExtRaINSIGHT to 71,702 whole genome sequences from gnomAD v3, we find particularly strong evidence of ultraselection in evolutionarily ancient miRNAs and neuronal protein-coding genes, as well as at splice sites. Moreover, our estimated selection coefficient against heterozygous amino-acid replacements across the genome (at 1.4%) is substantially larger than previous estimates based on smaller sample sizes. By contrast, we find weak evidence of ultraselection in other noncoding RNAs and transcription factor binding sites, and only modest evidence in ultraconserved elements and human accelerated regions. We estimate that ~0.3-0.5% of the human genome is ultraselected, with one third to one half of ultraselected sites falling in coding regions. These estimates suggest ~0.3-0.4 lethal or nearly lethal de novo mutations per potential human zygote, together with ~2 de novo mutations that are more weakly deleterious. Overall, our study sheds new light on the genome-wide distribution of fitness effects for new point mutations by combining deep new sequencing data sets and classical theory from population genetics.

Download Full-text

The shiftability of protein coding genes: the genetic code was optimized for frameshift tolerating

10.7287/peerj.preprints.806 ◽

2015 ◽

Cited By ~ 1

Author(s):

Xiaolong Wang ◽

Xuxiang Wang ◽

Gang Chen ◽

Jianye Zhang ◽

Yongqiang Liu ◽

...

Keyword(s):

Genetic Code ◽

Model Organisms ◽

Large Dataset ◽

Protein Coding ◽

E Coli ◽

Protein Coding Genes ◽

New Gene ◽

Sense Codon ◽

The Relationship ◽

Reading Frames

The genetic code defines the relationship between a protein and its coding DNA sequence. It was presumed that most frameshifts would yield non-functional, truncated or cytotoxic products. In this study, we report that in E. coli, a frameshift β-lactamase (bla) gene is still functional if all of the inner stop codons were readthrough or replaced by a sense codon. By analyzing a large dataset including all available protein coding genes in major model organisms, it is demonstrated that in any species, and in any protein-coding genes, the three translational products from the three different reading frames, are always similar to each other and with constant ~50% similarities and ~100% coverages, and the similarities is predefined by the genetic code rather than the sequences themselves. It is likely that a coding gene can be translated into three isoforms from each of the three reading frames, we propose a new gene expression paradigm, “one transcript, three translations”, which is an amendment to the traditional “one gene, one/multiple peptides” hypotheses. Finally, we concluded that the genetic code was optimized for frameshift tolerating in the early evolution, which endows every protein coding gene a character of shiftability, an inherent and everlasting ability to tolerate frameshift mutations, and serves as an innate mechanism for cells to deal with the frameshift problem.

Download Full-text

Characterization, Comparison of Four New Mitogenomes of Centrotinae (Hemiptera: Membracidae) and Phylogenetic Implications Supports New Synonymy

Life ◽

10.3390/life12010061 ◽

2022 ◽

Vol 12 (1) ◽

pp. 61

Author(s):

Ruitao Yu ◽

Leining Feng ◽

Christopher H. Dietrich ◽

Xiangqun Yuan

Keyword(s):

Secondary Structure ◽

Mitochondrial Genome ◽

Stop Codon ◽

Phylogenetic Analyses ◽

Sliding Window ◽

Protein Coding ◽

Protein Coding Genes ◽

Genome Data ◽

Phylogenetic Implications ◽

New Synonymy

To explore the phylogenetic relationships of the subfamily Centrotinae from the mitochondrial genome data, four complete mitogenomes (Anchon lineatus, Anchon yunnanensis, Gargara genistae and Tricentrus longivalvulatus) were sequenced and analyzed. All the newly sequenced mitogenomes contain 37 genes. Among the 13 protein-coding genes (PCGs) of the Centrotinae mitogenomes, a sliding window analysis and the ratio of Ka/Ks suggest that atp8 is a relatively fast evolving gene, while cox1 is the slowest. All PCGs start with ATN, except for nad5 (start with TTG), and stop with TAA or the incomplete stop codon T, except for nad2 and cytb (terminate with TAG). All tRNAs can fold into the typical cloverleaf secondary structure, except for trnS1, which lacks the dihydrouridine (DHU) arm. The BI and ML phylogenetic analyses of concatenated alignments of 13 mitochondrial PCGs among the major lineages produce a well-resolved framework. Phylogenetic analyses show that Membracoidea, Smiliinae and Centrotinae, together with tribes Centrotypini and Leptobelini are recovered as well-supported monophyletic groups. The tribe Gargarini (sensu Wallace et al.) and its monophyly are supported.

Download Full-text