scholarly journals DeepCLIP: Predicting the effect of mutations on protein-RNA binding with Deep Learning

2019 ◽  
Author(s):  
Alexander Gulliver Bjørnholt Grønning ◽  
Thomas Koed Doktor ◽  
Simon Jonas Larsen ◽  
Ulrika Simone Spangsberg Petersen ◽  
Lise Lolle Holm ◽  
...  

ABSTRACTNucleotide variants can cause functional changes by altering protein-RNA binding in various ways that are not easy to predict. This can affect processes such as splicing, nuclear shuttling, and stability of the transcript. Therefore, correct modelling of protein-RNA binding is critical when predicting the effects of sequence variations. Many RNA-binding proteins recognize a diverse set of motifs and binding is typically also dependent on the genomic context, making this task particularly challenging. Here, we present DeepCLIP, the first method for context-aware modeling and predicting protein binding to nucleic acids using exclusively sequence data as input. We show that DeepCLIP outperforms existing methods for modelling RNA-protein binding. Importantly, we demonstrate that DeepCLIP is able to reliably predict the functional effects of contextually dependent nucleotide variants in independent wet lab experiments. Furthermore, we show how DeepCLIP binding profiles can be used in the design of therapeutically relevant antisense oligonucleotides, and to uncover possible position-dependent regulation in a tissue-specific manner. DeepCLIP can be freely used at http://deepclip.compbio.sdu.dk.HighlightsWe have designed DeepCLIP as a simple neural network that requires only CLIP binding sites as input. The architecture and parameter settings of DeepCLIP makes it an efficient classifier and robust to train, making high performing models easy to train and recreate.Using an extensive benchmark dataset, we demonstrate that DeepCLIP outperforms existing tools in classification. Furthermore, DeepCLIP provides direct information about the neural network’s decision process through visualization of binding motifs and a binding profile that directly indicates sequence elements contributing to the classification.To show that DeepCLIP models generalize to different datasets we have demonstrated that predictions correlate with in vivo and in vitro experiments using quantitative binding assays and minigenes.Identifying the binding sites for regulatory RNA-binding proteins is fundamental for efficient design of (therapeutic) antisense oligonucleotides. Employing a reported disease associated mutation, we demonstrate that DeepCLIP can be used for design of therapeutic antisense oligonucleotides that block regions important for binding of regulatory proteins and correct aberrant splicing.Using DeepCLIP binding profiles, we uncovered a possible position-dependent mechanism behind the reported tissue-specificity of a group of TDP-43 repressed pseudoexons.We have made DeepCLIP available as an online tool for training and application of proteinRNA binding deep learning models and prediction of the potential effects of clinically detected sequence variations (http://deepclip.compbio.sdu.dk/). We also provide DeepCLIP as a configurable stand-alone program (http://www.github.com/deepclip).

2019 ◽  
Author(s):  
Michael Uhl ◽  
Van Dinh Tran ◽  
Rolf Backofen

AbstractCLIP-seq is the state-of-the-art technique to experimentally determine transcriptome-wide binding sites of RNA-binding proteins (RBPs). However, it relies on gene expression which can be highly variable between conditions, and thus cannot provide a complete picture of the RBP binding landscape. This necessitates the use of computational methods to predict missing binding sites. Here we present GraphProt2, a computational RBP binding site prediction method based on graph convolutional neural networks (GCN). In contrast to current CNN methods, GraphProt2 supports variable length input as well as the possibility to accurately predict nucleotide-wise binding profiles. We demonstrate its superior performance compared to GraphProt and a CNN-based method on single as well as combined CLIP-seq datasets.


2018 ◽  
Author(s):  
Alina Munteanu ◽  
Neelanjan Mukherjee ◽  
Uwe Ohler

AbstractMotivationRNA-binding proteins (RBPs) regulate every aspect of RNA metabolism and function. There are hundreds of RBPs encoded in the eukaryotic genomes, and each recognize its RNA targets through a specific mixture of RNA sequence and structure properties. For most RBPs, however, only a primary sequence motif has been determined, while the structure of the binding sites is uncharacterized.ResultsWe developed SSMART, an RNA motif finder that simultaneously models the primary sequence and the structural properties of the RNA targets sites. The sequence-structure motifs are represented as consensus strings over a degenerate alphabet, extending the IUPAC codes for nucleotides to account for secondary structure preferences. Evaluation on synthetic data showed that SSMART is able to recover both sequence and structure motifs implanted into 3‘UTR-like sequences, for various degrees of structured/unstructured binding sites. In addition, we successfully used SSMART on high-throughput in vivo and in vitro data, showing that we not only recover the known sequence motif, but also gain insight into the structural preferences of the RBP.AvailabilitySSMART is freely available at https://ohlerlab.mdc-berlin.de/software/SSMART_137/[email protected]


PLoS ONE ◽  
2021 ◽  
Vol 16 (5) ◽  
pp. e0250592
Author(s):  
Hiren Banerjee ◽  
Ravinder Singh

Background Downstream targets for a large number of RNA-binding proteins remain to be identified. The Drosophila master sex-switch protein Sex-lethal (SXL) is an RNA-binding protein that controls splicing, polyadenylation, or translation of certain mRNAs to mediate female-specific sexual differentiation. Whereas some targets of SXL are known, previous studies indicate that additional targets of SXL have escaped genetic screens. Methodology/Principal findings Here, we have used an alternative molecular approach of GEnomic Selective Enrichment of Ligands by Exponential enrichment (GESELEX) using both the genomic DNA and cDNA pools from several Drosophila developmental stages to identify new potential targets of SXL. Our systematic analysis provides a comprehensive view of the Drosophila transcriptome for potential SXL-binding sites. Conclusion/Significance We have successfully identified new SXL-binding sites in the Drosophila transcriptome. We discuss the significance of our analysis and that the newly identified binding sites and sequences could serve as a useful resource for the research community. This approach should also be applicable to other RNA-binding proteins for which downstream targets are unknown.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Jordy Homing Lam ◽  
Yu Li ◽  
Lizhe Zhu ◽  
Ramzan Umarov ◽  
Hanlun Jiang ◽  
...  

Abstract Protein-RNA interaction plays important roles in post-transcriptional regulation. However, the task of predicting these interactions given a protein structure is difficult. Here we show that, by leveraging a deep learning model NucleicNet, attributes such as binding preference of RNA backbone constituents and different bases can be predicted from local physicochemical characteristics of protein structure surface. On a diverse set of challenging RNA-binding proteins, including Fem-3-binding-factor 2, Argonaute 2 and Ribonuclease III, NucleicNet can accurately recover interaction modes discovered by structural biology experiments. Furthermore, we show that, without seeing any in vitro or in vivo assay data, NucleicNet can still achieve consistency with experiments, including RNAcompete, Immunoprecipitation Assay, and siRNA Knockdown Benchmark. NucleicNet can thus serve to provide quantitative fitness of RNA sequences for given binding pockets or to predict potential binding pockets and binding RNAs for previously unknown RNA binding proteins.


2020 ◽  
Author(s):  
Alexander Gulliver Bjørnholt Grønning ◽  
Thomas Koed Doktor ◽  
Simon Jonas Larsen ◽  
Ulrika Simone Spangsberg Petersen ◽  
Lise Lolle Holm ◽  
...  

Abstract Nucleotide variants can cause functional changes by altering protein–RNA binding in various ways that are not easy to predict. This can affect processes such as splicing, nuclear shuttling, and stability of the transcript. Therefore, correct modeling of protein–RNA binding is critical when predicting the effects of sequence variations. Many RNA-binding proteins recognize a diverse set of motifs and binding is typically also dependent on the genomic context, making this task particularly challenging. Here, we present DeepCLIP, the first method for context-aware modeling and predicting protein binding to RNA nucleic acids using exclusively sequence data as input. We show that DeepCLIP outperforms existing methods for modeling RNA-protein binding. Importantly, we demonstrate that DeepCLIP predictions correlate with the functional outcomes of nucleotide variants in independent wet lab experiments. Furthermore, we show how DeepCLIP binding profiles can be used in the design of therapeutically relevant antisense oligonucleotides, and to uncover possible position-dependent regulation in a tissue-specific manner. DeepCLIP is freely available as a stand-alone application and as a webtool at http://deepclip.compbio.sdu.dk.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Xiaoyong Pan ◽  
Yi Fang ◽  
Xianfeng Li ◽  
Yang Yang ◽  
Hong-Bin Shen

Abstract Background RNA-binding proteins (RBPs) play crucial roles in various biological processes. Deep learning-based methods have been demonstrated powerful on predicting RBP sites on RNAs. However, the training of deep learning models is very time-intensive and computationally intensive. Results Here we present a deep learning-based RBPsuite, an easy-to-use webserver for predicting RBP binding sites on linear and circular RNAs. For linear RNAs, RBPsuite predicts the RBP binding scores with them using our updated iDeepS. For circular RNAs (circRNAs), RBPsuite predicts the RBP binding scores with them using our developed CRIP. RBPsuite first breaks the input RNA sequence into segments of 101 nucleotides and scores the interaction between the segments and the RBPs. RBPsuite further detects the verified motifs on the binding segments gives the binding scores distribution along the full-length sequence. Conclusions RBPsuite is an easy-to-use online webserver for predicting RBP binding sites and freely available at http://www.csbio.sjtu.edu.cn/bioinf/RBPsuite/.


Sign in / Sign up

Export Citation Format

Share Document