scholarly journals Empowering the annotation and discovery of structured RNAs with scalable and accessible integrative clustering

2019 ◽  
Author(s):  
Milad Miladi ◽  
Eteri Sokhoyan ◽  
Torsten Houwaart ◽  
Steffen Heyne ◽  
Fabrizio Costa ◽  
...  

ABSTRACTRNA plays essential regulatory roles in all known forms of life. Clustering RNA sequences with common sequence and structure is an essential step towards studying RNA function. With the advent of high-throughput sequencing techniques, experimental and genomic data are expanding to complement the predictive methods. However, the existing methods do not effectively utilize and cope with the immense amount of data becoming available.Here we present GraphClust2, a comprehensive approach for scalable clustering of RNAs based on sequence and structural similarities. GraphClust2 provides an integrative solution by incorporating diverse types of experimental and genomic data in an accessible fashion via the Galaxy framework. We demonstrate that the tasks of clustering and annotation of structured RNAs can be considerably improved, through a scalable methodology that also supports structure probing data. Based on this, we further introduce an off-the-shelf procedure to identify locally conserved structure candidates in long RNAs. In this way, we suggest the presence and the sparsity of phylogenetically conserved local structures in some long non-coding RNAs. Furthermore, we demonstrate the advantage of a scalable clustering for discovering structured motifs under inherent and experimental biases and uncover prominent targets of the double-stranded RNA binding protein Roquin-1 that are evolutionary conserved.

GigaScience ◽  
2019 ◽  
Vol 8 (12) ◽  
Author(s):  
Milad Miladi ◽  
Eteri Sokhoyan ◽  
Torsten Houwaart ◽  
Steffen Heyne ◽  
Fabrizio Costa ◽  
...  

Abstract Background RNA plays essential roles in all known forms of life. Clustering RNA sequences with common sequence and structure is an essential step towards studying RNA function. With the advent of high-throughput sequencing techniques, experimental and genomic data are expanding to complement the predictive methods. However, the existing methods do not effectively utilize and cope with the immense amount of data becoming available. Results Hundreds of thousands of non-coding RNAs have been detected; however, their annotation is lagging behind. Here we present GraphClust2, a comprehensive approach for scalable clustering of RNAs based on sequence and structural similarities. GraphClust2 bridges the gap between high-throughput sequencing and structural RNA analysis and provides an integrative solution by incorporating diverse experimental and genomic data in an accessible manner via the Galaxy framework. GraphClust2 can efficiently cluster and annotate large datasets of RNAs and supports structure-probing data. We demonstrate that the annotation performance of clustering functional RNAs can be considerably improved. Furthermore, an off-the-shelf procedure is introduced for identifying locally conserved structure candidates in long RNAs. We suggest the presence and the sparseness of phylogenetically conserved local structures for a collection of long non-coding RNAs. Conclusions By clustering data from 2 cross-linking immunoprecipitation experiments, we demonstrate the benefits of GraphClust2 for motif discovery under the presence of biological and methodological biases. Finally, we uncover prominent targets of double-stranded RNA binding protein Roquin-1, such as BCOR’s 3′ untranslated region that contains multiple binding stem-loops that are evolutionary conserved.


2010 ◽  
Vol 7 (12) ◽  
pp. 995-1001 ◽  
Author(s):  
Jason G Underwood ◽  
Andrew V Uzilov ◽  
Sol Katzman ◽  
Courtney S Onodera ◽  
Jacob E Mainzer ◽  
...  

2020 ◽  
Vol 48 (W1) ◽  
pp. W300-W306 ◽  
Author(s):  
Jae Y Hwang ◽  
Sungbo Jung ◽  
Tae L Kook ◽  
Eric C Rouchka ◽  
Jinwoong Bok ◽  
...  

Abstract The rMAPS2 (RNA Map Analysis and Plotting Server 2) web server, freely available at http://rmaps.cecsresearch.org/, has provided the high-throughput sequencing data research community with curated tools for the identification of RNA binding protein sites. rMAPS2 analyzes differential alternative splicing or CLIP peak data obtained from high-throughput sequencing data analysis tools like MISO, rMATS, Piranha, PIPE-CLIP and PARalyzer, and then, graphically displays enriched RNA-binding protein target sites. The initial release of rMAPS focused only on the most common alternative splicing event, skipped exon or exon skipping. However, there was a high demand for the analysis of other major types of alternative splicing events, especially for retained intron events since this is the most common type of alternative splicing in plants, such as Arabidopsis thaliana. Here, we expanded the implementation of rMAPS2 to facilitate analyses for all five major types of alternative splicing events: skipped exon, mutually exclusive exons, alternative 5′ splice site, alternative 3′ splice site and retained intron. In addition, by employing multi-threading, rMAPS2 has vastly improved the user experience with significant reductions in running time, ∼3.5 min for the analysis of all five major alternative splicing types at once.


2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Elaine Hong ◽  
Andrew Best ◽  
Hannah Gautrey ◽  
Jas Chin ◽  
Anshuli Razdan ◽  
...  

Scaffold attachment factor B1 (SAFB1) and SAFB2 proteins are oestrogen (ER) corepressors that bind to and modulate ER activity through chromatin remodelling or interaction with the basal transcription machinery. SAFB proteins also have an internal RNA-recognition motif but little is known about the RNA-binding properties of SAFB1 or SAFB2. We utilised crosslinking and immunoprecipitation (iCLIP) coupled with high-throughput sequencing to enable a transcriptome-wide mapping of SAFB1 protein-RNA interactions in breast cancer MCF-7 cells. Analysis of crosslinking frequency mapped to transcript regions revealed that SAFB1 binds to coding and noncoding RNAs (ncRNAs). The highest proportion of SAFB1 crosslink sites mapped to ncRNAs, followed by intergenic regions, open reading frames (ORFs), introns, and 3′ or 5′ untranslated regions (UTR). Furthermore, we reveal that SAFB1 binds directly to RNA and its binding is particularly enriched at purine-rich sequences not dissimilar to the RNA-binding motifs for SR proteins. Using RNAi, we also show, for the first time, that single depletion of either SAFB1 or SAFB2 leads to an increase in expression of the other SAFB protein in both MCF-7 and MDA-MD231 breast cancer cells.


2016 ◽  
Vol 113 (19) ◽  
pp. E2579-E2588 ◽  
Author(s):  
Katarzyna P. Adamala ◽  
Daniel A. Martin-Alarcon ◽  
Edward S. Boyden

The ability to monitor and perturb RNAs in living cells would benefit greatly from a modular protein architecture that targets unmodified RNA sequences in a programmable way. We report that the RNA-binding protein PumHD (Pumilio homology domain), which has been widely used in native and modified form for targeting RNA, can be engineered to yield a set of four canonical protein modules, each of which targets one RNA base. These modules (which we call Pumby, for Pumilio-based assembly) can be concatenated in chains of varying composition and length, to bind desired target RNAs. The specificity of such Pumby–RNA interactions was high, with undetectable binding of a Pumby chain to RNA sequences that bear three or more mismatches from the target sequence. We validate that the Pumby architecture can perform RNA-directed protein assembly and enhancement of translation of RNAs. We further demonstrate a new use of such RNA-binding proteins, measurement of RNA translation in living cells. Pumby may prove useful for many applications in the measurement, manipulation, and biotechnological utilization of unmodified RNAs in intact cells and systems.


Methods ◽  
2017 ◽  
Vol 118-119 ◽  
pp. 171-181 ◽  
Author(s):  
Tzu-Fang Lou ◽  
Chase A. Weidmann ◽  
Jordan Killingsworth ◽  
Traci M. Tanaka Hall ◽  
Aaron C. Goldstrohm ◽  
...  

2016 ◽  
Vol 17 (1) ◽  
Author(s):  
Nathan D. Berkowitz ◽  
Ian M. Silverman ◽  
Daniel M. Childress ◽  
Hilal Kazan ◽  
Li-San Wang ◽  
...  

1997 ◽  
Vol 17 (5) ◽  
pp. 2649-2657 ◽  
Author(s):  
H Shi ◽  
B E Hoffman ◽  
J T Lis

B52, also known as SRp55, is a member of the Drosophila melanogaster SR protein family, a group of nuclear proteins that are both essential splicing factors and specific splicing regulators. Like most SR proteins, B52 contains two RNA recognition motifs in the N terminus and a C-terminal domain rich in serine-arginine dipeptide repeats. Since B52 is an essential protein and is expected to play a role in splicing a subset of Drosophila pre-mRNAs, its function is likely to be mediated by specific interactions with RNA. To investigate the RNA-binding specificity of B52, we isolated B52-binding RNAs by selection and amplification from a pool of random RNA sequences by using full-length B52 protein as the target. These RNAs contained a conserved consensus motif that constitutes the core of a secondary structural element predicted by energy minimization. Deletion and substitution mutations defined the B52-binding site on these RNAs as a hairpin loop structure covering about 20 nucleotides, which was confirmed by structure-specific enzymatic probing. Finally, we demonstrated that both RNA recognition motifs of B52 are required for RNA binding, while the RS domain is not involved in this interaction.


2021 ◽  
Vol 22 (19) ◽  
pp. 10263
Author(s):  
Martin Panigaj ◽  
Michael P. Marino ◽  
Jakob Reiser

Lentiviral (LV) vectors have emerged as powerful tools for transgene delivery ex vivo but in vivo gene therapy applications involving LV vectors have faced a number of challenges, including the low efficiency of transgene delivery, a lack of tissue specificity, immunogenicity to both the product encoded by the transgene and the vector, and the inactivation of the vector by the human complement cascade. To mitigate these issues, several engineering approaches, involving the covalent modification of vector particles or the incorporation of specific protein domains into the vector’s envelope, have been tested. Short synthetic oligonucleotides, including aptamers bound to the surface of LV vectors, may provide a novel means with which to retarget LV vectors to specific cells and to shield these vectors from neutralization by sera. The purpose of this study was to develop strategies to tether nucleic acid sequences, including short RNA sequences, to LV vector particles in a specific and tight fashion. To bind short RNA sequences to LV vector particles, a bacteriophage lambda N protein-derived RNA binding domain (λN), fused to the measles virus hemagglutinin protein, was used. The λN protein bound RNA sequences bearing a boxB RNA hairpin. To test this approach, we used an RNA aptamer specific to the human epidermal growth factor receptor (EGFR), which was bound to LV vector particles via an RNA scaffold containing a boxB RNA motif. The results obtained confirmed that the EGFR-specific RNA aptamer bound to cells expressing EGFR and that the boxB containing the RNA scaffold was bound specifically to the λN RNA binding domain attached to the vector. These results show that LV vectors can be equipped with nucleic acid sequences to develop improved LV vectors for in vivo applications.


2018 ◽  
Author(s):  
Kaiming Zhang ◽  
Xiaoyong Pan ◽  
Yang Yang ◽  
Hong-Bin Shen

AbstractCircular RNAs (circRNAs), with their crucial roles in gene regulation and disease development, have become a rising star in the RNA world. A lot of previous wet-lab studies focused on the interaction mechanisms between circRNAs and RNA-binding proteins (RBPs), as the knowledge of circRNA-RBP association is very important for understanding functions of circRNAs. Recently, the abundant CLIP-Seq experimental data has made the large-scale identification and analysis of circRNA-RBP interactions possible, while no computational tool based on machine learning has been developed yet.We present a new deep learning-based method, CRIP (CircRNAs Interact with Proteins), for the prediction of RBP binding sites on circRNAs, using only the RNA sequences. In order to fully exploit the sequence information, we propose a stacked codon-based encoding scheme and a hybrid deep learning architecture, in which a convolutional neural network (CNN) learns high-level abstract features and a recurrent neural network (RNN) learns long dependency in the sequences. We construct 37 datasets including sequence fragments of binding sites on circRNAs, and each set corresponds to one RBP. The experimental results show that the new encoding scheme is superior to the existing feature representation methods for RNA sequences, and the hybrid network outperforms conventional classifiers by a large margin, where both the CNN and RNN components contribute to the performance improvement. To the best of our knowledge, CRIP is the first machine learning-based tool specialized in the prediction of circRNA-RBP interactions, which is expected to play an important role for large-scale function analysis of circRNAs.


Sign in / Sign up

Export Citation Format

Share Document