scholarly journals Deep transcriptome annotation suggests that small and large proteins encoded in the same genes often cooperate

2017 ◽  
Author(s):  
Sondos Samandi ◽  
Annie V. Roy ◽  
Vivian Delcourt ◽  
Jean-François Lucier ◽  
Jules Gagnon ◽  
...  

AbstractRecent studies in eukaryotes have demonstrated the translation of alternative open reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and evolutionary patterns indicate that altORFs are particularly constrained in CDSs that evolve slowly. Thousands of predicted alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. Protein domains and co-conservation analyses suggest a potential functional relationship between small and large proteins encoded in the same genes. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many coding genes code for more than one protein that are often functionally related.

eLife ◽  
2017 ◽  
Vol 6 ◽  
Author(s):  
Sondos Samandi ◽  
Annie V Roy ◽  
Vivian Delcourt ◽  
Jean-François Lucier ◽  
Jules Gagnon ◽  
...  

Recent functional, proteomic and ribosome profiling studies in eukaryotes have concurrently demonstrated the translation of alternative open-reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by these altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and contain functional domains. Evolutionary analyses indicate that altORFs often show more extreme conservation patterns than their CDSs. Thousands of alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many genes are multicoding genes and code for a large protein and one or several small proteins.


2020 ◽  
Vol 40 (6) ◽  
Author(s):  
Corrine Corrina R. Hartford ◽  
Ashish Lal

ABSTRACT Recent advancements in genetic and proteomic technologies have revealed that more of the genome encodes proteins than originally thought possible. Specifically, some putative long noncoding RNAs (lncRNAs) have been misannotated as noncoding. Numerous lncRNAs have been found to contain short open reading frames (sORFs) which have been overlooked because of their small size. Many of these sORFs encode small proteins or micropeptides with fundamental biological importance. These micropeptides can aid in diverse processes, including cell division, transcription regulation, and cell signaling. Here we discuss strategies for establishing the coding potential of putative lncRNAs and describe various functions of known micropeptides.


1997 ◽  
Vol 17 (3) ◽  
pp. 1666-1673 ◽  
Author(s):  
R Bishop ◽  
A Musoke ◽  
S Morzaria ◽  
B Sohanpal ◽  
E Gobright

Concerted evolution of multicopy gene families in vertebrates is recognized as an important force in the generation of biological novelty but has not been documented for the multicopy genes of protozoa. A multicopy locus, Tpr, which consists of tandemly arrayed open reading frames (ORFs) containing several repeated elements has been described for Theileria parva. Herein we show that probes derived from the 5'/N-terminal ends of ORFs in the genomic DNAs of T. parva Uganda (1,108 codons) and Boleni (699 codons) hybridized with multicopy sequences in homologous DNA but did not detect similar sequences in the DNA of 14 heterologous T. parva stocks and clones. The probe sequences were, however, protein coding according to predictive algorithms and codon usage. The 3'/C-terminal ends of the Uganda and Boleni ORFs exhibited 75% similarity and identity, respectively, to the previously identified Tpr1 and Tpr2 repetitive elements of T. parva Muguga. Tpr1-homologous sequences were detected in two additional species of Theileria. Eight different Tpr1-homologous transcripts were present in piroplasm mRNA from a single T. parva Muguga-infected animal. The Tpr1 and Tpr2 amino acid sequences contained six predicted membrane-associated segments. The ratio of synonymous to nonsynonymous substitutions indicates that Tpr1 evolves like protein-encoding DNA. The previously determined nucleotide sequence of the gene encoding the p67 antigen is completely identical in T. parva Muguga, Boleni, and Uganda, including the third base in codons. The data suggest that concerted evolution can lead to the radical divergence of coding sequences and that this can be a mechanism for the generation of novel genes.


Genes ◽  
2020 ◽  
Vol 11 (9) ◽  
pp. 982
Author(s):  
Maksim Makarenko ◽  
Alexander Usatov ◽  
Tatiana Tatarinova ◽  
Kirill Azarin ◽  
Alexey Kovalevich ◽  
...  

The genus Helianthus is a diverse taxonomic group with approximately 50 species. Most sunflower genomic investigations are devoted to economically valuable species, e.g., H. annuus, while other Helianthus species, especially perennial, are predominantly a blind spot. In the current study, we have assembled the complete mitogenomes of two perennial species: H. grosseserratus (273,543 bp) and H. strumosus (281,055 bp). We analyzed their sequences and gene profiles in comparison to the available complete mitogenomes of H. annuus. Except for sdh4 and trnA-UGC, both perennial sunflower species had the same gene content and almost identical protein-coding sequences when compared with each other and with annual sunflowers (H. annuus). Common mitochondrial open reading frames (ORFs) (orf117, orf139, and orf334) in sunflowers and unique ORFs for H. grosseserratus (orf633) and H. strumosus (orf126, orf184, orf207) were identified. The maintenance of plastid-derived coding sequences in the mitogenomes of both annual and perennial sunflowers and the low frequency of nonsynonymous mutations point at an extremely low variability of mitochondrial DNA (mtDNA) coding sequences in the Helianthus genus.


2015 ◽  
Author(s):  
Anil Raj ◽  
Sidney H. Wang ◽  
Heejung Shim ◽  
Arbel Harpak ◽  
Yang I. Li ◽  
...  

AbstractAccurate annotation of protein coding regions is essential for understanding how genetic information is translated into biological functions. Here we describe riboHMM, a new method that uses ribosome footprint data along with gene expression and sequence information to accurately infer translated sequences. We applied our method to human lymphoblastoid cell lines and identified 7,273 previously unannotated coding sequences, including 2,442 translated upstream open reading frames. We observed an enrichment of harringtonine-treated ribosome footprints at the inferred initiation sites, validating many of the novel coding sequences. The novel sequences exhibit significant signatures of selective constraint in the reading frames of the inferred proteins, suggesting that many of these are functional. Nearly 40% of bicistronic transcripts showed significant negative correlation in the levels of translation of their two coding sequences, suggesting a key regulatory role for these novel translated sequences. Our work significantly expands the set of known coding regions in humans.


eLife ◽  
2016 ◽  
Vol 5 ◽  
Author(s):  
Anil Raj ◽  
Sidney H Wang ◽  
Heejung Shim ◽  
Arbel Harpak ◽  
Yang I Li ◽  
...  

Accurate annotation of protein coding regions is essential for understanding how genetic information is translated into function. We describe riboHMM, a new method that uses ribosome footprint data to accurately infer translated sequences. Applying riboHMM to human lymphoblastoid cell lines, we identified 7273 novel coding sequences, including 2442 translated upstream open reading frames. We observed an enrichment of footprints at inferred initiation sites after drug-induced arrest of translation initiation, validating many of the novel coding sequences. The novel proteins exhibit significant selective constraint in the inferred reading frames, suggesting that many are functional. Moreover, ~40% of bicistronic transcripts showed negative correlation in the translation levels of their two coding sequences, suggesting a potential regulatory role for these novel regions. Despite known limitations of mass spectrometry to detect protein expressed at low level, we estimated a 14% validation rate. Our work significantly expands the set of known coding regions in humans.


eLife ◽  
2014 ◽  
Vol 3 ◽  
Author(s):  
Jorge Ruiz-Orera ◽  
Xavier Messeguer ◽  
Juan Antonio Subirana ◽  
M Mar Alba

Deep transcriptome sequencing has revealed the existence of many transcripts that lack long or conserved open reading frames (ORFs) and which have been termed long non-coding RNAs (lncRNAs). The vast majority of lncRNAs are lineage-specific and do not yet have a known function. In this study, we test the hypothesis that they may act as a repository for the synthesis of new peptides. We find that a large fraction of the lncRNAs expressed in cells from six different species is associated with ribosomes. The patterns of ribosome protection are consistent with the translation of short peptides. lncRNAs show similar coding potential and sequence constraints than evolutionary young protein coding sequences, indicating that they play an important role in de novo protein evolution.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Robin-Lee Troskie ◽  
Yohaann Jafrani ◽  
Tim R. Mercer ◽  
Adam D. Ewing ◽  
Geoffrey J. Faulkner ◽  
...  

AbstractPseudogenes are gene copies presumed to mainly be functionless relics of evolution due to acquired deleterious mutations or transcriptional silencing. Using deep full-length PacBio cDNA sequencing of normal human tissues and cancer cell lines, we identify here hundreds of novel transcribed pseudogenes expressed in tissue-specific patterns. Some pseudogene transcripts have intact open reading frames and are translated in cultured cells, representing unannotated protein-coding genes. To assess the biological impact of noncoding pseudogenes, we CRISPR-Cas9 delete the nucleus-enriched pseudogene PDCL3P4 and observe hundreds of perturbed genes. This study highlights pseudogenes as a complex and dynamic component of the human transcriptional landscape.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
David S. M. Lee ◽  
Joseph Park ◽  
Andrew Kromer ◽  
Aris Baras ◽  
Daniel J. Rader ◽  
...  

AbstractRibosome-profiling has uncovered pervasive translation in non-canonical open reading frames, however the biological significance of this phenomenon remains unclear. Using genetic variation from 71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5’UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals, respectively, and elucidate their impact on protein expression in human cells. Our results suggest translation disrupting mechanisms relating uORF variation to reduced protein expression, and demonstrate that translation at uORFs is genetically constrained in 50% of human genes.


Insects ◽  
2020 ◽  
Vol 11 (6) ◽  
pp. 326
Author(s):  
Yu-Jun Wang ◽  
Hua-Ling Wang ◽  
Xiao-Wei Wang ◽  
Shu-Sheng Liu

Females and males often differ obviously in morphology and behavior, and the differences between sexes are the result of natural selection and/or sexual selection. To a great extent, the differences between the two sexes are the result of differential gene expression. In haplodiploid insects, this phenomenon is obvious, since males develop from unfertilized zygotes and females develop from fertilized zygotes. Whiteflies of the Bemisia tabaci species complex are typical haplodiploid insects, and some species of this complex are important pests of many crops worldwide. Here, we report the transcriptome profiles of males and females in three species of this whitefly complex. Between-species comparisons revealed that non-sex-biased genes display higher variation than male-biased or female-biased genes. Sex-biased genes evolve at a slow rate in protein coding sequences and gene expression and have a pattern of evolution that differs from those of social haplodiploid insects and diploid animals. Genes with high evolutionary rates are more related to non-sex-biased traits—such as nutrition, immune system, and detoxification—than to sex-biased traits, indicating that the evolution of protein coding sequences and gene expression has been mainly driven by non-sex-biased traits.


Sign in / Sign up

Export Citation Format

Share Document