Methods for RNA Modification Mapping Using Deep Sequencing: Established and New Emerging Technologies

Yuri Motorin; Mark Helm

doi:10.3390/genes10010035

Methods for RNA Modification Mapping Using Deep Sequencing: Established and New Emerging Technologies

Genes ◽

10.3390/genes10010035 ◽

2019 ◽

Vol 10 (1) ◽

pp. 35 ◽

Cited By ~ 35

Author(s):

Yuri Motorin ◽

Mark Helm

Keyword(s):

Deep Sequencing ◽

Global Scale ◽

High Rate ◽

Rna Modification ◽

Transcriptome Data ◽

Rna Seq ◽

Rna Modifications ◽

Accuracy And Precision ◽

A Cell ◽

Mapping Techniques

New analytics of post-transcriptional RNA modifications have paved the way for a tremendous upswing of the biological and biomedical research in this field. This especially applies to methods that included RNA-Seq techniques, and which typically result in what is termed global scale modification mapping. In this process, positions inside a cell`s transcriptome are receiving a status of potential modification sites (so called modification calling), typically based on a score of some kind that issues from the particular method applied. The resulting data are thought to represent information that goes beyond what is contained in typical transcriptome data, and hence the field has taken to use the term “epitranscriptome”. Due to the high rate of newly published mapping techniques, a significant number of chemically distinct RNA modifications have become amenable to mapping, albeit with variegated accuracy and precision, depending on the nature of the technique. This review gives a brief overview of known techniques, and how they were applied to modification calling.

Download Full-text

RNA Modification Level Estimation with pulseR

Genes ◽

10.3390/genes9120619 ◽

2018 ◽

Vol 9 (12) ◽

pp. 619

Author(s):

Etienne Boileau ◽

Christoph Dieterich

Keyword(s):

Experimental Approach ◽

High Efficiency ◽

Rna Modification ◽

Model Parameters ◽

Metabolic Labeling ◽

Rna Seq ◽

Rna Modifications ◽

Log Odds ◽

Pros And Cons ◽

Wide Scale

RNA modifications regulate the complex life of transcripts. An experimental approach called LAIC-seq was developed to characterize modification levels on a transcriptome-wide scale. In this method, the modified and unmodified molecules are separated using antibodies specific for a given RNA modification (e.g., m6A). In essence, the procedure of biochemical separation yields three fractions: Input, eluate, and supernatent, which are subjected to RNA-seq. In this work, we present a bioinformatics workflow, which starts from RNA-seq data to infer gene-specific modification levels by a statistical model on a transcriptome-wide scale. Our workflow centers around the pulseR package, which was originally developed for the analysis of metabolic labeling experiments. We demonstrate how to analyze data without external normalization (i.e., in the absence of spike-ins), given high efficiency of separation, and how, alternatively, scaling factors can be derived from unmodified spike-ins. Importantly, our workflow provides an estimate of uncertainty of modification levels in terms of confidence intervals for model parameters, such as gene expression and RNA modification levels. We also compare alternative model parametrizations, log-odds, or the proportion of the modified molecules and discuss the pros and cons of each representation. In summary, our workflow is a versatile approach to RNA modification level estimation, which is open to any read-count-based experimental approach.

Download Full-text

READemption - A tool for the computational analysis of deep-sequencing-based transcriptome data

10.1101/003723 ◽

2014 ◽

Cited By ~ 5

Author(s):

Konrad Ulrich Förstner ◽

Jörg Vogel ◽

Cynthia Mira Sharma

Keyword(s):

Data Processing ◽

Deep Sequencing ◽

Computational Analysis ◽

Command Line ◽

Transcriptome Data ◽

Rna Seq ◽

Command Line Interface ◽

Parallel Data ◽

Full Power ◽

Computationally Intensive

Summary: RNA-Seq has become a potent and widely used method to qualitatively and quantitatively study transcriptomes. In order to draw biological conclusions based on RNA-Seq data, several steps some of which are computationally intensive, have to betaken. Our READemption pipeline takes care of these individual tasks and integrates them into an easy-to-use tool with a command line interface. To leverage the full power of modern computers, most subcommands of READemption offer parallel data processing. While READemption was mainly developed for the analysis of bacterial primary transcriptomes, we have successfully applied it to analyze RNA-Seq reads from other sample types, including whole transcriptomes, RNA immunoprecipitated with proteins, not only from bacteria, but also from eukaryotes and archaea. Availability and Implementation: READemption is implemented in Python and is published under the ISC open source license. The tool and documentation is hosted at http://pythonhosted.org/READemption (DOI:10.6084/m9.figshare.977849).

Download Full-text

The prevalent RNA modification on SARS-CoV-2 RNAs may confound the SNP profile and evolutionary patterns revealed by previous studies

10.21203/rs.3.rs-41421/v1 ◽

2020 ◽

Author(s):

Yan Wang ◽

Yanhong Gai ◽

Yuefan Li ◽

Chunxiao Li ◽

Ziliang Li ◽

...

Keyword(s):

Rna Virus ◽

Variant Calling ◽

Virus Genome ◽

Rna Modification ◽

High Signal ◽

Rna Seq ◽

Rna Modifications ◽

Severe Damage ◽

Evolutionary Patterns ◽

Recent Outbreak

Abstract Background The recent outbreak of SARS-CoV-2 has caused severe damage to the world. The concomitant papers on the evolutionary patterns of SARS-CoV-2 is continuously emerging. Studies has utilized the publically available RNA-seq data to find out the so-called SNPs in the virus genome and analyzed their selection patterns. Methods We downloaded a set of RNA-seq data and performed a well-established but modified variant calling pipeline to allow the identification of multiple clustered mutations. Results We found prevalent “putative” but reliably detected A-to-G RNA modifications in the RNA-seq data of SARS-CoV-2 with high signal to noise ratios, presumably caused by the host’s deamination enzymes. Importantly, since SARS-CoV-2 is an RNA virus, it is technically impossible to truly distinguish SNPs and RNA modifications from the RNA-seq data alone. Conclusions The technically indistinguishable RNA modifications and SNPs of SARS-CoV-2 have complicated the situation where many researchers intend to unveil the evolutionary patterns behind the mutation spectrum. This is not a problem for DNA organisms but should be seriously considered when we are investigating the RNA viruses.

Download Full-text

Analysis of RNA Modifications by Second- and Third-Generation Deep Sequencing: 2020 Update

Genes ◽

10.3390/genes12020278 ◽

2021 ◽

Vol 12 (2) ◽

pp. 278

Author(s):

Yuri Motorin ◽

Virginie Marchand

Keyword(s):

Single Molecule ◽

Deep Sequencing ◽

Rna Modification ◽

Rna Modifications ◽

Single Molecule Sequencing ◽

Abasic Sites ◽

Phosphate Chain ◽

Adapter Ligation ◽

Ribose Phosphate ◽

Precise Mapping

The precise mapping and quantification of the numerous RNA modifications that are present in tRNAs, rRNAs, ncRNAs/miRNAs, and mRNAs remain a major challenge and a top priority of the epitranscriptomics field. After the keystone discoveries of massive m6A methylation in mRNAs, dozens of deep sequencing-based methods and protocols were proposed for the analysis of various RNA modifications, allowing us to considerably extend the list of detectable modified residues. Many of the currently used methods rely on the particular reverse transcription signatures left by RNA modifications in cDNA; these signatures may be naturally present or induced by an appropriate enzymatic or chemical treatment. The newest approaches also include labeling at RNA abasic sites that result from the selective removal of RNA modification or the enhanced cleavage of the RNA ribose-phosphate chain (perhaps also protection from cleavage), followed by specific adapter ligation. Classical affinity/immunoprecipitation-based protocols use either antibodies against modified RNA bases or proteins/enzymes, recognizing RNA modifications. In this survey, we review the most recent achievements in this highly dynamic field, including promising attempts to map RNA modifications by the direct single-molecule sequencing of RNA by nanopores.

Download Full-text

Pseudouridylation defect due toDKC1andNOP10mutations causes nephrotic syndrome with cataracts, hearing impairment, and enterocolitis

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2002328117 ◽

2020 ◽

Vol 117 (26) ◽

pp. 15137-15147 ◽

Cited By ~ 1

Author(s):

Eszter Balogh ◽

Jennifer C. Chandler ◽

Máté Varga ◽

Mona Tahoun ◽

Dóra K. Menyhárd ◽

...

Keyword(s):

Nephrotic Syndrome ◽

Sensorineural Deafness ◽

Dyskeratosis Congenita ◽

Rna Modification ◽

Rna Modifications ◽

Human Phenotype ◽

Telomere Attrition ◽

Binding Interface ◽

Core Proteins ◽

A Cell

RNA modifications play a fundamental role in cellular function. Pseudouridylation, the most abundant RNA modification, is catalyzed by the H/ACA small ribonucleoprotein (snoRNP) complex that shares four core proteins, dyskerin (DKC1), NOP10, NHP2, and GAR1. Mutations inDKC1,NOP10, orNHP2cause dyskeratosis congenita (DC), a disorder characterized by telomere attrition. Here, we report a phenotype comprising nephrotic syndrome, cataracts, sensorineural deafness, enterocolitis, and early lethality in two pedigrees: males withDKC1p.Glu206Lys and two children with homozygousNOP10p.Thr16Met. Females with heterozygousDKC1p.Glu206Lys developed cataracts and sensorineural deafness, but nephrotic syndrome in only one case of skewed X-inactivation. We found telomere attrition in both pedigrees, but no mucocutaneous abnormalities suggestive of DC. Both mutations fall at the dyskerin–NOP10 binding interface in a region distinct from those implicated in DC, impair the dyskerin–NOP10 interaction, and disrupt the catalytic pseudouridylation site. Accordingly, we found reduced pseudouridine levels in the ribosomal RNA (rRNA) of the patients. Zebrafishdkc1mutants recapitulate the human phenotype and show reduced 18S pseudouridylation, ribosomal dysregulation, and a cell-cycle defect in the absence of telomere attrition. We therefore propose that this human disorder is the consequence of defective snoRNP pseudouridylation and ribosomal dysfunction.

Download Full-text

RNA-combine: a toolkit for comprehensive analyses on transcriptome data from different sequencing platforms

BMC Bioinformatics ◽

10.1186/s12859-021-04549-y ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Xuemin Dong ◽

Shanshan Dong ◽

Shengkai Pan ◽

Xiangjiang Zhan

Keyword(s):

Biological Function ◽

Transcriptome Data ◽

Rna Seq ◽

Illumina Platform ◽

Sequencing Platform ◽

Source Codes ◽

A Cell ◽

Downstream Analysis ◽

Sequencing Platforms ◽

Result Interpretation

Abstract Background Understanding the transcriptome has become an essential step towards the full interpretation of the biological function of a cell, a tissue or even an organ. Many tools are available for either processing, analysing transcriptome data, or visualizing analysis results. However, most existing tools are limited to data from a single sequencing platform and only several of them could handle more than one analysis module, which are far from enough to meet the requirements of users, especially those without advanced programming skills. Hence, we still lack an open-source toolkit that enables both bioinformatician and non-bioinformatician users to process and analyze the large transcriptome data from different sequencing platforms and visualize the results. Results We present a Linux-based toolkit, RNA-combine, to automatically perform the quality assessment, downstream analysis of the transcriptome data generated from different sequencing platforms, including bulk RNA-seq (Illumina platform), single cell RNA-seq (10x Genomics) and Iso-Seq (PacBio) and visualization of the results. Besides, this toolkit is implemented with at least 10 analysis modules more than other toolkits examined in this study. Source codes of RNA-combine are available on GitHub: https://github.com/dongxuemin666/RNA-combine. Conclusion Our results suggest that RNA-combine is a reliable tool for transcriptome data processing and result interpretation for both bioinformaticians and non-bioinformaticians.

Download Full-text

PRMdb: A Repository of Predicted RNA Modifications in Plants

Plant and Cell Physiology ◽

10.1093/pcp/pcaa042 ◽

2020 ◽

Vol 61 (6) ◽

pp. 1213-1222

Author(s):

Xuan Ma ◽

Fuyan Si ◽

Xiaonan Liu ◽

Weijiang Luan

Keyword(s):

Plant Species ◽

Posttranscriptional Regulation ◽

Regulation Of Gene Expression ◽

Rna Modification ◽

Rna Seq ◽

Rna Modifications ◽

High Throughput Analysis ◽

Functional Studies ◽

Web Resource ◽

Wide Range

Abstract Evidence is mounting that RNA modifications play essential roles in posttranscriptional regulation of gene expression. So far, over 150 RNA modifications catalyzed by distinct enzymes have been documented. In plants, genome-wide identification of RNA modifications is largely limited to the model species Arabidopsis thaliana, while lacking in diverse non-model plants. Here, we present PRMdb, a plant RNA modification database, based on the analysis of thousands of RNA-seq, degradome-seq and small RNA-seq data from a wide range of plant species using the well-documented tool HAMR (high-throughput analysis of modified ribonucleotide). PRMdb provides a user-friendly interface that enables easy browsing and searching of the tRNA and mRNA modification data. We show that PRMdb collects high-confidence RNA modifications including novel RNA modification sites that can be validated by genomic PCR and reverse transcription PCR. In summary, PRMdb provides a valuable web resource for deciphering the epitranscriptomes in diverse plant species and will facilitate functional studies of RNA modifications in plants. RPMdb is available via http://www.biosequencing.cn/PRMdb/.

Download Full-text

Integrated Quantitative Analysis of the Phosphoproteome and Transcriptome in Tamoxifen-resistant Breast Cancer*

10.31234/osf.io/wtxu7 ◽

2020 ◽

Author(s):

Lungwani Muungo

Keyword(s):

Breast Cancer ◽

High Rate ◽

System Level ◽

Clinical Samples ◽

Transcriptome Data ◽

Data Set ◽

Reporter Assays ◽

Treated Breast ◽

Mcf 7 ◽

Resistant Cells

Quantitative phosphoproteome and transcriptome analysisof ligand-stimulated MCF-7 human breast cancer cells wasperformed to understand the mechanisms of tamoxifen resistanceat a system level. Phosphoproteome data revealed thatWT cells were more enriched with phospho-proteins thantamoxifen-resistant cells after stimulation with ligands.Surprisingly, decreased phosphorylation after ligand perturbationwas more common than increased phosphorylation.In particular, 17?-estradiol induced down-regulation inWT cells at a very high rate. 17?-Estradiol and the ErbBligand heregulin induced almost equal numbers of up-regulatedphospho-proteins in WT cells. Pathway and motifactivity analyses using transcriptome data additionallysuggested that deregulated activation of GSK3? (glycogensynthasekinase 3?) and MAPK1/3 signaling might be associatedwith altered activation of cAMP-responsive elementbindingprotein and AP-1 transcription factors intamoxifen-resistant cells, and this hypothesis was validatedby reporter assays. An examination of clinical samples revealedthat inhibitory phosphorylation of GSK3? at serine 9was significantly lower in tamoxifen-treated breast cancerpatients that eventually had relapses, implying that activationof GSK3? may be associated with the tamoxifen-resistantphenotype. Thus, the combined phosphoproteomeand transcriptome data set analyses revealed distinct signal

Download Full-text

Integrated genomic analysis reveals regulatory pathways and dynamic landscapes of the tRNA transcriptome

Scientific Reports ◽

10.1038/s41598-021-83469-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Zefang Sun ◽

Jia Tan ◽

Minqiong Zhao ◽

Qiyao Peng ◽

Mingqing Zhou ◽

...

Keyword(s):

Expression Profiles ◽

Genomic Analysis ◽

Rna Seq ◽

Regulatory Pathways ◽

Cellular Processes ◽

Dynamic Landscapes ◽

Rna Fragments ◽

A Cell ◽

Considerable Impact ◽

New Algorithms

AbstracttRNAs and tRNA-derived RNA fragments (tRFs) play various roles in many cellular processes outside of protein synthesis. However, comprehensive investigations of tRNA/tRF regulation are rare. In this study, we used new algorithms to extensively analyze the publicly available data from 1332 ChIP-Seq and 42 small-RNA-Seq experiments in human cell lines and tissues to investigate the transcriptional and posttranscriptional regulatory mechanisms of tRNAs. We found that histone acetylation, cAMP, and pluripotency pathways play important roles in the regulation of the tRNA gene transcription in a cell-specific manner. Analysis of RNA-Seq data identified 950 high-confidence tRFs, and the results suggested that tRNA pools are dramatically distinct across the samples in terms of expression profiles and tRF composition. The mismatch analysis identified new potential modification sites and specific modification patterns in tRNA families. The results also show that RNA library preparation technologies have a considerable impact on tRNA profiling and need to be optimized in the future.

Download Full-text

Annotation of snoRNA abundance across human tissues reveals complex snoRNA-host gene relationships

Genome Biology ◽

10.1186/s13059-021-02391-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Étienne Fafard-Couture ◽

Danny Bergeron ◽

Sonia Couture ◽

Sherif Abou-Elela ◽

Michelle S. Scott

Keyword(s):

Housekeeping Genes ◽

Host Gene ◽

Rna Modification ◽

Human Tissues ◽

Rna Seq ◽

Healthy Human ◽

Protein Coding ◽

Conservation Level ◽

Nucleolar Rnas ◽

Host Genes

Abstract Background Small nucleolar RNAs (snoRNAs) are mid-size non-coding RNAs required for ribosomal RNA modification, implying a ubiquitous tissue distribution linked to ribosome synthesis. However, increasing numbers of studies identify extra-ribosomal roles of snoRNAs in modulating gene expression, suggesting more complex snoRNA abundance patterns. Therefore, there is a great need for mapping the snoRNome in different human tissues as the blueprint for snoRNA functions. Results We used a low structure bias RNA-Seq approach to accurately quantify snoRNAs and compare them to the entire transcriptome in seven healthy human tissues (breast, ovary, prostate, testis, skeletal muscle, liver, and brain). We identify 475 expressed snoRNAs categorized in two abundance classes that differ significantly in their function, conservation level, and correlation with their host gene: 390 snoRNAs are uniformly expressed and 85 are enriched in the brain or reproductive tissues. Most tissue-enriched snoRNAs are embedded in lncRNAs and display strong correlation of abundance with them, whereas uniformly expressed snoRNAs are mostly embedded in protein-coding host genes and are mainly non- or anticorrelated with them. Fifty-nine percent of the non-correlated or anticorrelated protein-coding host gene/snoRNA pairs feature dual-initiation promoters, compared to only 16% of the correlated non-coding host gene/snoRNA pairs. Conclusions Our results demonstrate that snoRNAs are not a single homogeneous group of housekeeping genes but include highly regulated tissue-enriched RNAs. Indeed, our work indicates that the architecture of snoRNA host genes varies to uncouple the host and snoRNA expressions in order to meet the different snoRNA abundance levels and functional needs of human tissues.

Download Full-text