scholarly journals Combined analysis of genome sequencing and RNA-motifs reveals novel damaging non-coding mutations in human tumors

2017 ◽  
Author(s):  
Babita Singh ◽  
Juan L. Trincado ◽  
PJ Tatlow ◽  
Stephen R. Piccolo ◽  
Eduardo Eyras

AbstractA major challenge in cancer research is to determine the biological and clinical significance of somatic mutations in non-coding regions. This has been studied in terms of recurrence, functional impact, and association to individual regulatory sites, but the combinatorial contribution of mutations to common RNA regulatory motifs has not been explored. We developed a new method, MIRA, to perform the first comprehensive study of significantly mutated regions (SMRs) affecting binding sites for RNA-binding proteins (RBPs) in cancer. Extracting signals related to RNA-related selection processes and using RNA sequencing data from the same samples we identified alterations in RNA expression and splicing linked to mutations on RBP binding sites. We found SRSF10 and MBNL1 motifs in introns, HNRPLL motifs at 5’ UTRs, as well as 5’ and 3’ splice-site motifs, among others, with specific mutational patterns that disrupt the motif and impact RNA processing. MIRA facilitates the integrative analysis of multiple genome sites that operate collectively through common RBPs and can aid in the interpretation of non-coding variants in cancer. MIRA is available athttps://github.com/comprna/mira.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Zhengfeng Wang ◽  
Xiujuan Lei

Abstract Background Circular RNAs (circRNAs) are widely expressed in cells and tissues and are involved in biological processes and human diseases. Recent studies have demonstrated that circRNAs can interact with RNA-binding proteins (RBPs), which is considered an important aspect for investigating the function of circRNAs. Results In this study, we design a slight variant of the capsule network, called circRB, to identify the sequence specificities of circRNAs binding to RBPs. In this model, the sequence features of circRNAs are extracted by convolution operations, and then, two dynamic routing algorithms in a capsule network are employed to discriminate between different binding sites by analysing the convolution features of binding sites. The experimental results show that the circRB method outperforms the existing computational methods. Afterwards, the trained models are applied to detect the sequence motifs on the seven circRNA-RBP bound sequence datasets and matched to known human RNA motifs. Some motifs on circular RNAs overlap with those on linear RNAs. Finally, we also predict binding sites on the reported full-length sequences of circRNAs interacting with RBPs, attempting to assist current studies. We hope that our model will contribute to better understanding the mechanisms of the interactions between RBPs and circRNAs. Conclusion In view of the poor studies about the sequence specificities of circRNA-binding proteins, we designed a classification framework called circRB based on the capsule network. The results show that the circRB method is an effective method, and it achieves higher prediction accuracy than other methods.


GigaScience ◽  
2021 ◽  
Vol 10 (6) ◽  
Author(s):  
Florian Heyl ◽  
Rolf Backofen

Abstract Background The prediction of binding sites (peak-calling) is a common task in the data analysis of methods such as cross-linking immunoprecipitation in combination with high-throughput sequencing (CLIP-Seq). The predicted binding sites are often further analyzed to predict sequence motifs or structure patterns. When looking at a typical result of such high-throughput experiments, the obtained peak profiles differ largely on a genomic level. Thus, a tool is missing that evaluates and classifies the predicted peaks on the basis of their shapes. We hereby present StoatyDive, a tool that can be used to filter for specific peak profile shapes of sequencing data such as CLIP. Findings With StoatyDive we are able to classify peak profile shapes from CLIP-seq data of the histone stem-loop-binding protein (SLBP). We compare the results to existing tools and show that StoatyDive finds more distinct peak shape clusters for CLIP data. Furthermore, we present StoatyDive’s capabilities as a quality control tool and as a filter to pick different shapes based on biological or technical questions for other CLIP data from different RNA binding proteins with different biological functions and numbers of RNA recognition motifs. We finally show that proteins involved in splicing, such as RBM22 and U2AF1, have potentially sharper-shaped peaks than other RNA binding proteins. Conclusion StoatyDive finally fills the demand for a peak shape clustering tool for CLIP-Seq data that fine-tunes downstream analysis steps such as structure or sequence motif predictions and that acts as a quality control.


2021 ◽  
Vol 15 ◽  
Author(s):  
Lichao Zhang ◽  
Zihong Huang ◽  
Liang Kong

Background: RNA-binding proteins establish posttranscriptional gene regulation by coordinating the maturation, editing, transport, stability, and translation of cellular RNAs. The immunoprecipitation experiments could identify interaction between RNA and proteins, but they are limited due to the experimental environment and material. Therefore, it is essential to construct computational models to identify the function sites. Objective: Although some computational methods have been proposed to predict RNA binding sites, the accuracy could be further improved. Moreover, it is necessary to construct a dataset with more samples to design a reliable model. Here we present a computational model based on multi-information sources to identify RNA binding sites. Method: We construct an accurate computational model named CSBPI_Site, based on xtreme gradient boosting. The specifically designed 15-dimensional feature vector captures four types of information (chemical shift, chemical bond, chemical properties and position information). Results: The satisfied accuracy of 0.86 and AUC of 0.89 were obtained by leave-one-out cross validation. Meanwhile, the accuracies were slightly different (range from 0.83 to 0.85) among three classifiers algorithm, which showed the novel features are stable and fit to multiple classifiers. These results showed that the proposed method is effective and robust for noncoding RNA binding sites identification. Conclusion: Our method based on multi-information sources is effective to represent the binding sites information among ncRNAs. The satisfied prediction results of Diels-Alder riboz-yme based on CSBPI_Site indicates that our model is valuable to identify the function site.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Mariana G. Ferrarini ◽  
Avantika Lal ◽  
Rita Rebollo ◽  
Andreas J. Gruber ◽  
Andrea Guarracino ◽  
...  

AbstractThe novel betacoronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused a worldwide pandemic (COVID-19) after emerging in Wuhan, China. Here we analyzed public host and viral RNA sequencing data to better understand how SARS-CoV-2 interacts with human respiratory cells. We identified genes, isoforms and transposable element families that are specifically altered in SARS-CoV-2-infected respiratory cells. Well-known immunoregulatory genes including CSF2, IL32, IL-6 and SERPINA3 were differentially expressed, while immunoregulatory transposable element families were upregulated. We predicted conserved interactions between the SARS-CoV-2 genome and human RNA-binding proteins such as the heterogeneous nuclear ribonucleoprotein A1 (hnRNPA1) and eukaryotic initiation factor 4 (eIF4b). We also identified a viral sequence variant with a statistically significant skew associated with age of infection, that may contribute to intracellular host–pathogen interactions. These findings can help identify host mechanisms that can be targeted by prophylactics and/or therapeutics to reduce the severity of COVID-19.


Author(s):  
Tao Wang ◽  
Xiaojun Li ◽  
Xiaojing Zhang ◽  
Qing Wang ◽  
Wenqian Liu ◽  
...  

A large number of RNA molecules have been found in the phloem of higher plants, and they can be transported to distant organelles through the phloem. RNA signals are important cues to be evolving in fortification strategies by long-distance transportation when suffering from various physiological challenges. So far, the mechanism of RNA selectively transportation through phloem cells is still in progress. Up to now, evidence have shown that several RNA motifs including Polypyrimidine (poly-CU) sequence, transfer RNA (tRNA)-related sequence, Single Nucleotide Mutation bound with specific RNA binding proteins to form Ribonucleotide protein (RNP) complexes could facilitate RNA mobility in plants. Furthermore, some RNA secondary structure such as tRNA-like structure (TLS), untranslation region (UTR) of mRNA, stem-loop structure of pre-miRNA also contributed to the mobility of RNAs. Latest researchs found that RNA methylation such as methylated 5′ cytosine (m5C) played an important role in RNA transport and function. These studies lay a theoretical foundation to uncover the mechanism of RNA transport. We aim to provide ideas and clues to inspire future research on the function of RNA motifs in RNA long-distance transport, furthermore to explore the underlying mechanism of RNA systematic signaling.


2018 ◽  
Author(s):  
Alina Munteanu ◽  
Neelanjan Mukherjee ◽  
Uwe Ohler

AbstractMotivationRNA-binding proteins (RBPs) regulate every aspect of RNA metabolism and function. There are hundreds of RBPs encoded in the eukaryotic genomes, and each recognize its RNA targets through a specific mixture of RNA sequence and structure properties. For most RBPs, however, only a primary sequence motif has been determined, while the structure of the binding sites is uncharacterized.ResultsWe developed SSMART, an RNA motif finder that simultaneously models the primary sequence and the structural properties of the RNA targets sites. The sequence-structure motifs are represented as consensus strings over a degenerate alphabet, extending the IUPAC codes for nucleotides to account for secondary structure preferences. Evaluation on synthetic data showed that SSMART is able to recover both sequence and structure motifs implanted into 3‘UTR-like sequences, for various degrees of structured/unstructured binding sites. In addition, we successfully used SSMART on high-throughput in vivo and in vitro data, showing that we not only recover the known sequence motif, but also gain insight into the structural preferences of the RBP.AvailabilitySSMART is freely available at https://ohlerlab.mdc-berlin.de/software/SSMART_137/[email protected]


PLoS ONE ◽  
2021 ◽  
Vol 16 (5) ◽  
pp. e0250592
Author(s):  
Hiren Banerjee ◽  
Ravinder Singh

Background Downstream targets for a large number of RNA-binding proteins remain to be identified. The Drosophila master sex-switch protein Sex-lethal (SXL) is an RNA-binding protein that controls splicing, polyadenylation, or translation of certain mRNAs to mediate female-specific sexual differentiation. Whereas some targets of SXL are known, previous studies indicate that additional targets of SXL have escaped genetic screens. Methodology/Principal findings Here, we have used an alternative molecular approach of GEnomic Selective Enrichment of Ligands by Exponential enrichment (GESELEX) using both the genomic DNA and cDNA pools from several Drosophila developmental stages to identify new potential targets of SXL. Our systematic analysis provides a comprehensive view of the Drosophila transcriptome for potential SXL-binding sites. Conclusion/Significance We have successfully identified new SXL-binding sites in the Drosophila transcriptome. We discuss the significance of our analysis and that the newly identified binding sites and sequences could serve as a useful resource for the research community. This approach should also be applicable to other RNA-binding proteins for which downstream targets are unknown.


2016 ◽  
Author(s):  
Natasha G Caminsky ◽  
Eliseos J Mucaki ◽  
Ami M. Perri ◽  
Ruipeng Lu ◽  
Joan H.M. Knoll ◽  
...  

BRCA1andBRCA2testing for HBOC does not identify all pathogenic variants. Sequencing of 20 complete genes in HBOC patients with uninformative test results (N=287), including non-coding and flanking sequences ofATM,BARD1,BRCA1,BRCA2,CDH1,CHEK2,EPCAM,MLH1,MRE11A,MSH2,MSH6,MUTYH,NBN,PALB2,PMS2,PTEN,RAD51B,STK11,TP53, andXRCC2, identified 38,372 unique variants. We apply information theory (IT) to predict novel functions for and prioritize non-coding variants of uncertain significance (VUS) in throughout regulatory, coding, and intronic regions based on changes in binding sites in these genesof these genes. Besides mRNA splicing, IT provides a common framework to evaluate potential affinity changes inin transcription factor (TFBSs), splicing regulatory (SRBSs), and RNA-binding protein (RBBSs) protein binding sites following mutationat mutated binding sites. We prioritized variants affecting the strengths of 10 variants affecting splice sites (4 natural, 6 cryptic), 148 SRBS, 36 TFBS, and 31 RBBS binding strength-affecting variantss. Three variants were also prioritized based on their predicted effects on mRNA secondary (2°) structure, and 17 for pseudoexon activation. Additionally, 4 frameshift, 2 in-frame deletions, and 5 stop-gain mutations were identified. When combined with pedigree information, complete gene sequence analysis can focus attention on a limited set of variants in a wide spectrum of functional mutation types for downstream functional and co-segregation analysis.


2020 ◽  
Author(s):  
Shaoyi Ji ◽  
Ze Yang ◽  
Leonardi Gozali ◽  
Thomas Kenney ◽  
Arif Kocabas ◽  
...  

AbstractMature mRNA molecules are typically considered to be comprised of a 5’UTR, a 3’UTR and a coding region (CDS), all attached until degradation. Unexpectedly, however, there have been multiple recent reports of widespread differential expression of mRNA 3’UTRs and their cognate coding regions, resulting in the expression of isolated 3’UTRs (i3’UTRs); these i3’UTRs can be highly expressed, often in reciprocal patterns to their cognate CDS. Similar to the role of other lncRNAs, isolated 3’UTRs are likely to play an important role in gene regulation but little is known about the contexts in which they are deployed. To begin to parse the functions of i3’UTRs, here we carry out in vitro, in vivo and in silico analyses of differential 3’UTR/CDS mRNA ratio usage across tissues, development and cell state changes both for a select list of developmentally important genes as well as through unbiased transcriptome-wide analyses. Across two developmental paradigms we find a distinct switch from high i3’UTR expression of stem cell related genes in proliferating cells compared to newly differentiated cells. Our unbiased transcriptome analysis across multiple gene sets shows that regardless of tissue, genes with high 3’UTR to CDS ratios belong predominantly to gene ontology categories related to cell-type specific functions while in contrast, the gene ontology categories of genes with low 3’UTR to CDS ratios are similar and relate to common cellular functions. In addition to these specific findings our data provide critical information from which detailed hypotheses for individual i3’UTRs can be tested-with a common theme that i3’UTRs appear poised to regulate cell-specific gene expression and state.Significance StatementThe widespread existence and expression of mRNA 3’ untranslated sequences in the absence of their cognate coding regions (called isolated 3’UTRs or i3’UTRs) opens up considerable avenues for gene regulation not previously envisioned. Each isolated 3’UTR may still bind and interact with micro RNAs, RNA binding proteins as well as other nucleic acid sequences, all in the absence or low levels of cognate protein production. Here we document the expression, localization and regulation of i3’UTRs both within particular biological systems as well as across the transcriptome. As this is an entirely new area of experimental investigation these early studies are seminal to this burgeoning field.


2019 ◽  
Vol 4 (Spring 2019) ◽  
Author(s):  
Alexa Vandenburg

The Norris lab recently identified two RNA binding proteins required for proper neuron-specific splicing. The lab conducted touch- response behavioral assays to assess the function of these proteins in touch-sensing neurons. After isolating C. elegans worms with specific phenotypes, the lab used automated computer tracking and video analysis to record the worms’ behavior. The behavior of mutant worms differed from that of wild-type worms. The Norris lab also discovered two possible RNA binding protein sites in SAD-1, a neuronal gene implicated in the neuronal development of C. elegans1. These two binding sites may control the splicing of SAD-1. The lab transferred mutated DNA into the genome of wild-type worms by injecting a mutated plasmid. The newly transformed worms fluoresced green, indicating that the two binding sites control SAD-1 splicing.


Sign in / Sign up

Export Citation Format

Share Document