scholarly journals ssHMM: Extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data

2016 ◽  
Author(s):  
David Heller ◽  
Martin Vingron ◽  
Ralf Krestel ◽  
Uwe Ohler ◽  
Annalisa Marsico

AbstractRNA-binding proteins (RBPs) play important roles in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. To which extent RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders which produce informative motifs and simultaneously capture the relationship between primary sequence and different RNA secondary structures are missing. We developed ssHMM, an RNA motif finder that combines a hidden Markov model (HMM) with Gibbs sampling to learn the joint sequence and structure binding preferences of RBPs from high-throughput data, such as CLIP-Seq sequences, and visualizes them as a graph. Evaluations on synthetic data showed that ssHMM reliably recovers fuzzy sequence motifs in 80 to 100% of the cases. It produces motifs with higher information content than existing tools and is faster than other methods on large datasets. Examples of new sequence-structure motifs identified by ssHMM for uncharacterized RBPs are also discussed. ssHMM is freely available on Github at https://github.molgen.mpg.de/heller/ssHMM.

2018 ◽  
Author(s):  
Peter K. Koo ◽  
Praveen Anand ◽  
Steffan B. Paul ◽  
Sean R. Eddy

AbstractTo infer the sequence and RNA structure specificities of RNA-binding proteins (RBPs) from experiments that enrich for bound sequences, we introduce a convolutional residual network which we call ResidualBind. ResidualBind significantly outperforms previous methods on experimental data from many RBP families. We interrogate ResidualBind to identify what features it has learned from high-affinity sequences with saliency analysis along with 1st-order and 2nd-orderin silicomutagenesis. We show that in addition to sequence motifs, ResidualBind learns a model that includes the number of motifs, their spacing, and both positive and negative effects of RNA structure context. Strikingly, ResidualBind learns RNA structure context, including detailed base-pairing relationships, directly from sequence data, which we confirm on synthetic data. ResidualBind is a powerful, flexible, and interpretable model that can uncovercis-recognition preferences across a broad spectrum of RBPs.


2018 ◽  
Author(s):  
Alina Munteanu ◽  
Neelanjan Mukherjee ◽  
Uwe Ohler

AbstractMotivationRNA-binding proteins (RBPs) regulate every aspect of RNA metabolism and function. There are hundreds of RBPs encoded in the eukaryotic genomes, and each recognize its RNA targets through a specific mixture of RNA sequence and structure properties. For most RBPs, however, only a primary sequence motif has been determined, while the structure of the binding sites is uncharacterized.ResultsWe developed SSMART, an RNA motif finder that simultaneously models the primary sequence and the structural properties of the RNA targets sites. The sequence-structure motifs are represented as consensus strings over a degenerate alphabet, extending the IUPAC codes for nucleotides to account for secondary structure preferences. Evaluation on synthetic data showed that SSMART is able to recover both sequence and structure motifs implanted into 3‘UTR-like sequences, for various degrees of structured/unstructured binding sites. In addition, we successfully used SSMART on high-throughput in vivo and in vitro data, showing that we not only recover the known sequence motif, but also gain insight into the structural preferences of the RBP.AvailabilitySSMART is freely available at https://ohlerlab.mdc-berlin.de/software/SSMART_137/[email protected]


GigaScience ◽  
2020 ◽  
Vol 9 (11) ◽  
Author(s):  
Florian Heyl ◽  
Daniel Maticzka ◽  
Michael Uhl ◽  
Rolf Backofen

Abstract Background Post-transcriptional regulation via RNA-binding proteins plays a fundamental role in every organism, but the regulatory mechanisms lack important understanding. Nevertheless, they can be elucidated by cross-linking immunoprecipitation in combination with high-throughput sequencing (CLIP-Seq). CLIP-Seq answers questions about the functional role of an RNA-binding protein and its targets by determining binding sites on a nucleotide level and associated sequence and structural binding patterns. In recent years the amount of CLIP-Seq data skyrocketed, urging the need for an automatic data analysis that can deal with different experimental set-ups. However, noncanonical data, new protocols, and a huge variety of tools, especially for peak calling, made it difficult to define a standard. Findings CLIP-Explorer is a flexible and reproducible data analysis pipeline for iCLIP data that supports for the first time eCLIP, FLASH, and uvCLAP data. Individual steps like peak calling can be changed to adapt to different experimental settings. We validate CLIP-Explorer on eCLIP data, finding similar or nearly identical motifs for various proteins in comparison with other databases. In addition, we detect new sequence motifs for PTBP1 and U2AF2. Finally, we optimize the peak calling with 3 different peak callers on RBFOX2 data, discuss the difficulty of the peak-calling step, and give advice for different experimental set-ups. Conclusion CLIP-Explorer finally fills the demand for a flexible CLIP-Seq data analysis pipeline that is applicable to the up-to-date CLIP protocols. The article further shows the limitations of current peak-calling algorithms and the importance of a robust peak detection.


Author(s):  
Jinkai Wang

Abstract Post-transcriptional processing of RNAs plays important roles in a variety of physiological and pathological processes. These processes can be precisely controlled by a series of RNA binding proteins and cotranscriptionally regulated by transcription factors as well as histone modifications. With the rapid development of high-throughput sequencing techniques, multiomics data have been broadly used to study the mechanisms underlying the important biological processes. However, how to use these high-throughput sequencing data to elucidate the fundamental regulatory roles of post-transcriptional processes is still of great challenge. This review summarizes the regulatory mechanisms of post-transcriptional processes and the general principles and approaches to dissect these mechanisms by integrating multiomics data as well as public resources.


GigaScience ◽  
2021 ◽  
Vol 10 (6) ◽  
Author(s):  
Florian Heyl ◽  
Rolf Backofen

Abstract Background The prediction of binding sites (peak-calling) is a common task in the data analysis of methods such as cross-linking immunoprecipitation in combination with high-throughput sequencing (CLIP-Seq). The predicted binding sites are often further analyzed to predict sequence motifs or structure patterns. When looking at a typical result of such high-throughput experiments, the obtained peak profiles differ largely on a genomic level. Thus, a tool is missing that evaluates and classifies the predicted peaks on the basis of their shapes. We hereby present StoatyDive, a tool that can be used to filter for specific peak profile shapes of sequencing data such as CLIP. Findings With StoatyDive we are able to classify peak profile shapes from CLIP-seq data of the histone stem-loop-binding protein (SLBP). We compare the results to existing tools and show that StoatyDive finds more distinct peak shape clusters for CLIP data. Furthermore, we present StoatyDive’s capabilities as a quality control tool and as a filter to pick different shapes based on biological or technical questions for other CLIP data from different RNA binding proteins with different biological functions and numbers of RNA recognition motifs. We finally show that proteins involved in splicing, such as RBM22 and U2AF1, have potentially sharper-shaped peaks than other RNA binding proteins. Conclusion StoatyDive finally fills the demand for a peak shape clustering tool for CLIP-Seq data that fine-tunes downstream analysis steps such as structure or sequence motif predictions and that acts as a quality control.


2018 ◽  
Author(s):  
Inga Jarmoskaite ◽  
Sarah K. Denny ◽  
Pavanapuresan P. Vaidyanathan ◽  
Winston R. Becker ◽  
Johan O.L. Andreasson ◽  
...  

SummaryHigh-throughput methodologies have enabled routine generation of RNA target sets and sequence motifs for RNA-binding proteins (RBPs). Nevertheless, quantitative approaches are needed to capture the landscape of RNA/RBP interactions responsible for cellular regulation. We have used the RNA-MaP platform to directly measure equilibrium binding for thousands of designed RNAs and to construct a predictive model for RNA recognition by the human Pumilio proteins PUM1 and PUM2. Despite prior findings of linear sequence motifs, our measurements revealed widespread residue flipping and instances of positional coupling. Application of our thermodynamic model to published in vivo crosslinking data reveals quantitative agreement between predicted affinities and in vivo occupancies. Our analyses suggest a thermodynamically driven, continuous Pumilio binding landscape that is negligibly affected by RNA structure or kinetic factors, such as displacement by ribosomes. This work provides a quantitative foundation for dissecting the cellular behavior of RBPs and cellular features that impact their occupancies.


2017 ◽  
Author(s):  
Daniel Dominguez ◽  
Peter Freese ◽  
Maria Alexis ◽  
Amanda Su ◽  
Myles Hochman ◽  
...  

SUMMARYProduction of functional cellular RNAs involves multiple processing and regulatory steps principally mediated by RNA binding proteins (RBPs). Here we present the affinity landscapes of 78 human RBPs using an unbiased assay that determines the sequence, structure, and context preferences of an RBP in vitro from deep sequencing of bound RNAs. Analyses of these data revealed several interesting patterns, including unexpectedly low diversity of RNA motifs, implying frequent convergent evolution of binding specificity toward a relatively small set of RNA motifs, many with low compositional complexity. Offsetting this trend, we observed extensive preferences for contextual features outside of core RNA motifs, including spaced “bipartite” motifs, biased flanking nucleotide context, and bias away from or towards RNA structure. These contextual features are likely to enable targeting of distinct subsets of transcripts by different RBPs that recognize the same core motif. Our results enable construction of “RNA maps” of RBP activity without requiring crosslinking-based assays, and provide unprecedented depth of information on the interaction of RBPs with RNA.


Molecules ◽  
2020 ◽  
Vol 25 (14) ◽  
pp. 3130 ◽  
Author(s):  
Siran Tian ◽  
Harrison A. Curnutte ◽  
Tatjana Trcek

RNA granules are ubiquitous. Composed of RNA-binding proteins and RNAs, they provide functional compartmentalization within cells. They are inextricably linked with RNA biology and as such are often referred to as the hubs for post-transcriptional regulation. Much of the attention has been given to the proteins that form these condensates and thus many fundamental questions about the biology of RNA granules remain poorly understood: How and which RNAs enrich in RNA granules, how are transcripts regulated in them, and how do granule-enriched mRNAs shape the biology of a cell? In this review, we discuss the imaging, genetic, and biochemical data, which have revealed that some aspects of the RNA biology within granules are carried out by the RNA itself rather than the granule proteins. Interestingly, the RNA structure has emerged as an important feature in the post-transcriptional control of granule transcripts. This review is part of the Special Issue in the Frontiers in RNA structure in the journal Molecules.


2018 ◽  
Author(s):  
Jing Zhang ◽  
Jason Liu ◽  
Donghoon Lee ◽  
Jo-Jo Feng ◽  
Lucas Lochovsky ◽  
...  

AbstractRNA-binding proteins (RBPs) play key roles in post-transcriptional regulation and disease. Their binding sites cover more of the genome than coding exons; nevertheless, most noncoding variant-prioritization methods only focus on transcriptional regulation. Here, we integrate the portfolio of ENCODE-RBP experiments to develop RADAR, a variant-scoring framework. RADAR uses conservation, RNA structure, network centrality, and motifs to provide an overall impact score. Then it further incorporates tissue-specific inputs to highlight disease-specific variants. Our results demonstrate RADAR can successfully pinpoint variants, both somatic and germline, associated with RBP-function dysregulation, that cannot be found by most current prioritization methods, for example variants affecting splicing.


2021 ◽  
Author(s):  
Eun Seon Kim ◽  
Chang Geon Chung ◽  
Jeong Hyang Park ◽  
Byung Su Ko ◽  
Sung Soon Park ◽  
...  

Abstract RNA-binding proteins (RBPs) play essential roles in diverse cellular processes through post-transcriptional regulation of RNAs. The subcellular localization of RBPs is thus under tight control, the breakdown of which is associated with aberrant cytoplasmic accumulation of nuclear RBPs such as TDP-43 and FUS, well-known pathological markers for amyotrophic lateral sclerosis and frontotemporal dementia (ALS/FTD). Here, we report in Drosophila model for ALS/FTD that nuclear accumulation of a cytoplasmic RBP, Staufen, may be a new pathological feature. We found that in Drosophila C4da neurons expressing PR36, one of the arginine-rich dipeptide repeat proteins (DPRs), Staufen accumulated in the nucleus in Importin- and RNA-dependent manner. Notably, expressing Staufen with exogenous NLS—but not with mutated endogenous NLS—potentiated PR-induced dendritic defect, suggesting that nuclear-accumulated Staufen can enhance PR toxicity. PR36 expression increased Fibrillarin staining in the nucleolus, which was enhanced by heterozygous mutation of stau (stau+/−), a gene that codes Staufen. Furthermore, knockdown of fib, which codes Fibrillarin, exacerbated retinal degeneration mediated by PR toxicity, suggesting that increased amount of Fibrillarin by stau+/− is protective. Stau+/− also reduced the amount of PR-induced nuclear-accumulated Staufen and mitigated retinal degeneration and rescued viability of flies expressing PR36. Taken together, our data show that nuclear accumulation of Staufen in neurons may be an important pathological feature contributing to the pathogenesis of ALS/FTD.


Sign in / Sign up

Export Citation Format

Share Document