Comprehensive Annotations of Human Herpesvirus 6A and 6B Genomes Reveal Novel and Conserved Genomic Features

Mapping Intimacies ◽

10.1101/730028 ◽

2019 ◽

Author(s):

Yaara Finkel ◽

Dominik Schmiedel ◽

Julie Tai-Schmiedel ◽

Aharon Nachshon ◽

Michal Schwartz ◽

...

Keyword(s):

Human Herpesvirus ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Temporal Expression ◽

Protein Coding ◽

Functional Studies ◽

Viral Genes ◽

Non Coding Rnas ◽

Coding Potential ◽

Reading Frames

AbstractHuman herpesvirus 6 (HHV-6) A and B are highly ubiquitous betaherpesviruses, infecting the majority of the human population. Like other herpesviruses, they encompass large genomes and our understanding of their protein coding potential is far from complete. Here we employ ribosome profiling and systematic transcript analysis to experimentally define the HHV-6 translation products and to follow their temporal expression. We identify hundreds of new open reading frames (ORFs), including many upstream ORFs (uORFs) and internal ORFs (iORFs), generating a complete unbiased atlas of HHV-6 proteome. Furthermore, by integrating systematic data from the prototypic betaherpesvirus, human cytomegalovirus, we uncover numerous uORFs and iORFs that are conserved across betaherpesviruses and we show that uORFs are specifically enriched in late viral genes. Using our transcriptome measurements, we identified three highly abundant HHV-6 encoded long non-coding RNAs (lncRNAs), one of which generates a non-polyadenylated stable intron that appears to be a conserved feature of betaherpesviruses. Overall, our work reveals the complexity of HHV-6 genomes and highlights novel features that are conserved between betaherpesviruses, providing a rich resource for future functional studies.

Download Full-text

Comprehensive annotations of human herpesvirus 6A and 6B genomes reveal novel and conserved genomic features

eLife ◽

10.7554/elife.50960 ◽

2020 ◽

Vol 9 ◽

Cited By ~ 10

Author(s):

Yaara Finkel ◽

Dominik Schmiedel ◽

Julie Tai-Schmiedel ◽

Aharon Nachshon ◽

Roni Winkler ◽

...

Keyword(s):

Human Herpesvirus ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Protein Coding ◽

Genomic Features ◽

Functional Studies ◽

Viral Genes ◽

Non Coding Rnas ◽

Coding Potential ◽

Reading Frames

Human herpesvirus-6 (HHV-6) A and B are ubiquitous betaherpesviruses, infecting the majority of the human population. They encompass large genomes and our understanding of their protein coding potential is far from complete. Here, we employ ribosome-profiling and systematic transcript-analysis to experimentally define HHV-6 translation products. We identify hundreds of new open reading frames (ORFs), including upstream ORFs (uORFs) and internal ORFs (iORFs), generating a complete unbiased atlas of HHV-6 proteome. By integrating systematic data from the prototypic betaherpesvirus, human cytomegalovirus, we uncover numerous uORFs and iORFs conserved across betaherpesviruses and we show uORFs are enriched in late viral genes. We identified three highly abundant HHV-6 encoded long non-coding RNAs, one of which generates a non-polyadenylated stable intron appearing to be a conserved feature of betaherpesviruses. Overall, our work reveals the complexity of HHV-6 genomes and highlights novel features conserved between betaherpesviruses, providing a rich resource for future functional studies.

Download Full-text

A community-driven roadmap to advance research on translated open reading frames detected by Ribo-seq

10.1101/2021.06.10.447896 ◽

2021 ◽

Author(s):

Jonathan M Mudge ◽

Jorge Ruiz-Orera ◽

John R Prensner ◽

Marie A Brunet ◽

Jose Manuel Gonzalez ◽

...

Keyword(s):

Gene Annotation ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Untranslated Regions ◽

Biological Databases ◽

Protein Coding ◽

Circular Problem ◽

Advance Research ◽

Non Coding Rnas ◽

Reading Frames

Ribosome profiling (Ribo-seq) has catalyzed a paradigm shift in our understanding of the translational vocabulary of the human genome, discovering thousands of translated open reading frames (ORFs) within long non-coding RNAs and presumed untranslated regions of protein-coding genes. However, reference gene annotation projects have been circumspect in their incorporation of these ORFs due to uncertainties about their experimental reproducibility and physiological roles. Yet, it is indisputable that certain Ribo-seq ORFs make stable proteins, others mediate gene regulation, and many have medical implications. Ultimately, the absence of standardized ORF annotation has created a circular problem: while Ribo-seq ORFs remain unannotated by reference biological databases, this lack of characterisation will thwart research efforts examining their roles. Here, we outline the initial stages of a community-led effort supported by GENCODE / Ensembl, HGNC and UniProt to produce a consolidated catalog of human Ribo-seq ORFs.

Download Full-text

Long non-coding RNAs as a source of new peptides

eLife ◽

10.7554/elife.03523 ◽

2014 ◽

Vol 3 ◽

Cited By ~ 241

Author(s):

Jorge Ruiz-Orera ◽

Xavier Messeguer ◽

Juan Antonio Subirana ◽

M Mar Alba

Keyword(s):

Transcriptome Sequencing ◽

De Novo ◽

Large Fraction ◽

Open Reading Frames ◽

Protein Coding ◽

Coding Sequences ◽

Non Coding Rnas ◽

Sequence Constraints ◽

Coding Potential ◽

Reading Frames

Deep transcriptome sequencing has revealed the existence of many transcripts that lack long or conserved open reading frames (ORFs) and which have been termed long non-coding RNAs (lncRNAs). The vast majority of lncRNAs are lineage-specific and do not yet have a known function. In this study, we test the hypothesis that they may act as a repository for the synthesis of new peptides. We find that a large fraction of the lncRNAs expressed in cells from six different species is associated with ribosomes. The patterns of ribosome protection are consistent with the translation of short peptides. lncRNAs show similar coding potential and sequence constraints than evolutionary young protein coding sequences, indicating that they play an important role in de novo protein evolution.

Download Full-text

Disrupting upstream translation in mRNAs is associated with human disease

Nature Communications ◽

10.1038/s41467-021-21812-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

David S. M. Lee ◽

Joseph Park ◽

Andrew Kromer ◽

Aris Baras ◽

Daniel J. Rader ◽

...

Keyword(s):

Protein Expression ◽

Biological Significance ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Protein Coding ◽

Stop Codons ◽

Human Genes ◽

Strong Negative Selection ◽

Disease Associations ◽

Reading Frames

AbstractRibosome-profiling has uncovered pervasive translation in non-canonical open reading frames, however the biological significance of this phenomenon remains unclear. Using genetic variation from 71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5’UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals, respectively, and elucidate their impact on protein expression in human cells. Our results suggest translation disrupting mechanisms relating uORF variation to reduced protein expression, and demonstrate that translation at uORFs is genetically constrained in 50% of human genes.

Download Full-text

Overexpression-based detection of translatable circular RNAs is vulnerable to coexistent linear RNA byproducts

10.1101/2021.03.23.433163 ◽

2021 ◽

Author(s):

Yanyi Jiang ◽

Xiaofan Chen ◽

Wei Zhang

Keyword(s):

Open Reading Frames ◽

Systematic Evaluation ◽

Circular Rnas ◽

Protein Coding ◽

Rolling Circle ◽

Functional Investigation ◽

Overexpression System ◽

Translation Signals ◽

Coding Potential ◽

Reading Frames

AbstractIn RNA field, the demarcation between coding and non-coding has been negotiated by the recent discovery of occasionally translated circular RNAs (circRNAs). Although absent of 5’ cap structure, circRNAs can be translated cap-independently. Complementary intron-mediated overexpression is one of the most utilized methodologies for circRNA research but not without bearing echoing skepticism for its poorly defined mechanism and latent coexistent side products. In this study, leveraging such circRNA overexpression system, we have interrogated the protein-coding potential of 30 human circRNAs containing infinite open reading frames in HEK293T cells. Surprisingly, pervasive translation signals are detected by immunoblotting. However, intensive mutagenesis reveals that numerous translation signals are generated independently of circRNA synthesis. We have developed a dual tag strategy to isolate translation noise and directly demonstrate that the fallacious translation signals originate from cryptically spliced linear transcripts. The concomitant linear RNA byproducts, presumably concatemers, can be translated to allow pseudo rolling circle translation signals, and can involve backsplicing junction (BSJ) to disqualify the BSJ-based evidence for circRNA translation. We also find non-AUG start codons may engage in the translation initiation of circRNAs. Taken together, our systematic evaluation sheds light on heterogeneous translational outputs from circRNA overexpression vector and comes with a caveat that ectopic overexpression technique necessitates extremely rigorous control setup in circRNA translation and functional investigation.

Download Full-text

When Long Noncoding Becomes Protein Coding

Molecular and Cellular Biology ◽

10.1128/mcb.00528-19 ◽

2020 ◽

Vol 40 (6) ◽

Cited By ~ 14

Author(s):

Corrine Corrina R. Hartford ◽

Ashish Lal

Keyword(s):

Cell Division ◽

Cell Signaling ◽

Transcription Regulation ◽

Noncoding Rnas ◽

Long Noncoding Rnas ◽

Open Reading Frames ◽

Protein Coding ◽

Small Proteins ◽

Coding Potential ◽

Reading Frames

ABSTRACT Recent advancements in genetic and proteomic technologies have revealed that more of the genome encodes proteins than originally thought possible. Specifically, some putative long noncoding RNAs (lncRNAs) have been misannotated as noncoding. Numerous lncRNAs have been found to contain short open reading frames (sORFs) which have been overlooked because of their small size. Many of these sORFs encode small proteins or micropeptides with fundamental biological importance. These micropeptides can aid in diverse processes, including cell division, transcription regulation, and cell signaling. Here we discuss strategies for establishing the coding potential of putative lncRNAs and describe various functions of known micropeptides.

Download Full-text

Identification of Proteins Associated with Murine Cytomegalovirus Virions

Journal of Virology ◽

10.1128/jvi.78.20.11187-11197.2004 ◽

2004 ◽

Vol 78 (20) ◽

pp. 11187-11197 ◽

Cited By ~ 105

Author(s):

Lisa M. Kattenhorn ◽

Ryan Mills ◽

Markus Wagner ◽

Alexandre Lomsadze ◽

Vsevolod Makeev ◽

...

Keyword(s):

Gene Prediction ◽

Polyacrylamide Gel Electrophoresis ◽

Sodium Dodecyl ◽

Open Reading Frames ◽

Murine Cytomegalovirus ◽

Prediction Algorithm ◽

Sequencing Analysis ◽

Protein Coding ◽

Coding Potential ◽

Reading Frames

ABSTRACT Proteins associated with the murine cytomegalovirus (MCMV) viral particle were identified by a combined approach of proteomic and genomic methods. Purified MCMV virions were dissociated by complete denaturation and subjected to either separation by sodium dodecyl sulfate-polyacrylamide gel electrophoresis and in-gel digestion or treated directly by in-solution tryptic digestion. Peptides were separated by nanoflow liquid chromatography and analyzed by tandem mass spectrometry (LC-MS/MS). The MS/MS spectra obtained were searched against a database of MCMV open reading frames (ORFs) predicted to be protein coding by an MCMV-specific version of the gene prediction algorithm GeneMarkS. We identified 38 proteins from the capsid, tegument, glycoprotein, replication, and immunomodulatory protein families, as well as 20 genes of unknown function. Observed irregularities in coding potential suggested possible sequence errors in the 3′-proximal ends of m20 and M31. These errors were experimentally confirmed by sequencing analysis. The MS data further indicated the presence of peptides derived from the unannotated ORFs ORFc225441-226898 (m166.5) and ORF105932-106072. Immunoblot experiments confirmed expression of m166.5 during viral infection.

Download Full-text

uORF-Tools – Workflow for the determination of translation-regulatory upstream open reading frames

10.1101/415018 ◽

2018 ◽

Cited By ~ 1

Author(s):

Anica Scholz ◽

Florian Eggenhofer ◽

Rick Gelhausen ◽

Björn Grüning ◽

Kathi Zarnack ◽

...

Keyword(s):

Ribosome Profiling ◽

Open Reading Frames ◽

Annotation File ◽

Inhibitory Effects ◽

Protein Coding ◽

Reading Frame ◽

Upstream Open Reading Frames ◽

Induced Changes ◽

Reading Frames

AbstractRibosome profiling (ribo-seq) provides a means to analyze active translation by determining ribosome occupancy in a transcriptome-wide manner. The vast majority of ribosome protected fragments (RPFs) resides within the protein-coding sequence of mRNAs. However, commonly reads are also found within the transcript leader sequence (TLS) (aka 5’ untranslated region) preceding the main open reading frame (ORF), indicating the translation of regulatory upstream ORFs (uORFs). Here, we present a workflow for the identification of translation-regulatory uORFs. Specifically, uORF-Tools identifies uORFs within a given dataset and generates a uORF annotation file. In addition, a comprehensive human uORF annotation file, based on 35 ribo-seq files, is provided, which can serve as an alternative input file for the workflow. To assess the translation-regulatory activity of the uORFs, stimulus-induced changes in the ratio of the RPFs residing in the main ORFs relative to those found in the associated uORFs are determined. The resulting output file allows for the easy identification of candidate uORFs, which have translation-inhibitory effects on their associated main ORFs. uORF-Tools is available as a free and open Snakemake workflow at https://github.com/Biochemistry1-FFM/uORF-Tools. It is easily installed and all necessary tools are provided in a version-controlled manner, which also ensures lasting usability. uORF-Tools is designed for intuitive use and requires only limited computing times and resources.

Download Full-text

RNA G-quadruplexes mark repressive upstream open reading frames in human mRNAs

10.1101/223073 ◽

2017 ◽

Cited By ~ 1

Author(s):

Pierre Murat ◽

Giovanni Marsico ◽

Barbara Herdy ◽

Avazeh Ghanbarian ◽

Guillem Portella ◽

...

Keyword(s):

Secondary Structures ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Untranslated Regions ◽

Translation Regulation ◽

Physical Interaction ◽

Protein Coding ◽

Upstream Open Reading Frames ◽

Nucleotide Resolution ◽

Reading Frames

ABSTRACTRNA secondary structures in the 5’ untranslated regions (UTRs) of mRNAs have been characterised as key determinants of translation initiation. However the role of non-canonical secondary structures, such as RNA G-quadruplexes (rG4s), in modulating translation of human mRNAs and the associated mechanisms remain largely unappreciated. Here we use a ribosome profiling strategy to investigate the translational landscape of human mRNAs with structured 5’ untranslated regions (5’-UTR). We found that inefficiently translated mRNAs, containing rG4-forming sequences in their 5’-UTRs, have an accumulation of ribosome footprints in their 5’-UTRs. We show that rG4-forming sequences are determinants of 5’-UTR translation, suggesting that the folding of rG4 structures thwarts the translation of protein coding sequences (CDS) by stimulating the translation of repressive upstream open reading frames (uORFs). To support our model, we demonstrate that depletion of two rG4s-specialised DEAH-box helicases, DHX36 and DHX9, shifts translation towards rG4-containing uORFs reducing the translation of selected transcripts comprising proto-oncogenes, transcription factors and epigenetic regulators. Transcriptome-wide identification of DHX9 binding sites using individual-nucleotide resolution UV crosslinking and immunoprecipitation (iCLIP) demonstrate that translation regulation is mediated through direct physical interaction between the helicase and its rG4 substrate. Our findings unveil a previously unknown role for non-canonical structures in governing 5’-UTR translation and suggest that the interaction of helicases with rG4s could be considered as a target for future therapeutic intervention.

Download Full-text

Thousands of novel unannotated proteins expand the MHC I immunopeptidome in cancer

10.1101/2020.02.12.945840 ◽

2020 ◽

Cited By ~ 6

Author(s):

Tamara Ouspenskaia ◽

Travis Law ◽

Karl R. Clauser ◽

Susan Klaeger ◽

Siranush Sarkizova ◽

...

Keyword(s):

Somatic Mutations ◽

Tumor Antigens ◽

Ribosome Profiling ◽

Lymphocytic Leukemia ◽

Open Reading Frames ◽

Specific Expression ◽

Protein Coding ◽

Mhc I ◽

Coding Regions ◽

Reading Frames

AbstractTumor epitopes – peptides that are presented on surface-bound MHC I proteins - provide targets for cancer immunotherapy and have been identified extensively in the annotated protein-coding regions of the genome. Motivated by the recent discovery of translated novel unannotated open reading frames (nuORFs) using ribosome profiling (Ribo-seq), we hypothesized that cancer-associated processes could generate nuORFs that can serve as a new source of tumor antigens that harbor somatic mutations or show tumor-specific expression. To identify cancer-specific nuORFs, we generated Ribo-seq profiles for 29 malignant and healthy samples, developed a sensitive analytic approach for hierarchical ORF prediction, and constructed a high-confidence database of translated nuORFs across tissues. Peptides from 3,555 unique translated nuORFs were presented on MHC I, based on analysis of an extensive dataset of MHC I-bound peptides detected by mass spectrometry, with >20-fold more nuORF peptides detected in the MHC I immunopeptidomes compared to whole proteomes. We further detected somatic mutations in nuORFs of cancer samples and identified nuORFs with tumor-specific translation in melanoma, chronic lymphocytic leukemia and glioblastoma. NuORFs thus expand the pool of MHC I-presented, tumor-specific peptides, targetable by immunotherapies.

Download Full-text