scholarly journals Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks

2017 ◽  
Author(s):  
Xiaoyong Pan ◽  
Peter Rijnbeek ◽  
Junchi Yan ◽  
Hong-Bin Shen

AbstractRNA regulation is significantly dependent on its binding protein partner, which is known as the RNA-binding proteins (RBPs). Unfortunately, the binding preferences for most RBPs are still not well characterized, especially on the structure point of view. Informative signals hiding and interdependencies between sequence and structure specificities are two challenging problems for both predicting RBP binding sites and accurate sequence and structure motifs mining.In this study, we propose a deep learning-based method, iDeepS, to simultaneously identify the binding sequence and structure motifs from RNA sequences using convolutional neural networks (CNNs) and a bidirectional long short term memory network (BLSTM). We first perform one-hot encoding for both the sequence and predicted secondary structure, which are appropriate for subsequent convolution operations. To reveal the hidden binding knowledge from the observations, the CNNs are applied to learn the abstract motif features. Considering the close relationship between sequences and predicted structures, we use the BLSTM to capture the long range dependencies between binding sequence and structure motifs identified by the CNNs. Finally, the learned weighted representations are fed into a classification layer to predict the RBP binding sites. We evaluated iDeepS on verified RBP binding sites derived from large-scale representative CLIP-seq datasets, and the results demonstrate that iDeepS can reliably predict the RBP binding sites on RNAs, and outperforms the state-of-the-art methods. An important advantage is that iDeepS is able to automatically extract both binding sequence and structure motifs, which will improve our transparent understanding of the mechanisms of binding specificities of RBPs. iDeepS is available at https://github.com/xypan1232/iDeepS.

BMC Genomics ◽  
2020 ◽  
Vol 21 (S13) ◽  
Author(s):  
Lei Deng ◽  
Youzhi Liu ◽  
Yechuan Shi ◽  
Wenhao Zhang ◽  
Chun Yang ◽  
...  

Abstract Background RNA binding proteins (RBPs) play a vital role in post-transcriptional processes in all eukaryotes, such as splicing regulation, mRNA transport, and modulation of mRNA translation and decay. The identification of RBP binding sites is a crucial step in understanding the biological mechanism of post-transcriptional gene regulation. However, the determination of RBP binding sites on a large scale is a challenging task due to high cost of biochemical assays. Quite a number of studies have exploited machine learning methods to predict binding sites. Especially, deep learning is increasingly used in the bioinformatics field by virtue of its ability to learn generalized representations from DNA and protein sequences. Results In this paper, we implemented a novel deep neural network model, DeepRKE, which combines primary RNA sequence and secondary structure information to effectively predict RBP binding sites. Specifically, we used word embedding algorithm to extract features of RNA sequences and secondary structures, i.e., distributed representation of k-mers sequence rather than traditional one-hot encoding. The distributed representations are taken as input of convolutional neural networks (CNN) and bidirectional long-term short-term memory networks (BiLSTM) to identify RBP binding sites. Our results show that deepRKE outperforms existing counterpart methods on two large-scale benchmark datasets. Conclusions Our extensive experimental results show that DeepRKE is an efficacious tool for predicting RBP binding sites. The distributed representations of RNA sequences and secondary structures can effectively detect the latent relationship and similarity between k-mers, and thus improve the predictive performance. The source code of DeepRKE is available at https://github.com/youzhiliu/DeepRKE/.


2018 ◽  
Author(s):  
Kaiming Zhang ◽  
Xiaoyong Pan ◽  
Yang Yang ◽  
Hong-Bin Shen

AbstractCircular RNAs (circRNAs), with their crucial roles in gene regulation and disease development, have become a rising star in the RNA world. A lot of previous wet-lab studies focused on the interaction mechanisms between circRNAs and RNA-binding proteins (RBPs), as the knowledge of circRNA-RBP association is very important for understanding functions of circRNAs. Recently, the abundant CLIP-Seq experimental data has made the large-scale identification and analysis of circRNA-RBP interactions possible, while no computational tool based on machine learning has been developed yet.We present a new deep learning-based method, CRIP (CircRNAs Interact with Proteins), for the prediction of RBP binding sites on circRNAs, using only the RNA sequences. In order to fully exploit the sequence information, we propose a stacked codon-based encoding scheme and a hybrid deep learning architecture, in which a convolutional neural network (CNN) learns high-level abstract features and a recurrent neural network (RNN) learns long dependency in the sequences. We construct 37 datasets including sequence fragments of binding sites on circRNAs, and each set corresponds to one RBP. The experimental results show that the new encoding scheme is superior to the existing feature representation methods for RNA sequences, and the hybrid network outperforms conventional classifiers by a large margin, where both the CNN and RNN components contribute to the performance improvement. To the best of our knowledge, CRIP is the first machine learning-based tool specialized in the prediction of circRNA-RBP interactions, which is expected to play an important role for large-scale function analysis of circRNAs.


2021 ◽  
Author(s):  
Kevin McKernan ◽  
Anthony M. Kyriakopoulos ◽  
Peter McCullough

Codon optimization describes the process used to increase protein production by use of alternative but synonymous codon changes. In SARS-CoV-2 mRNA vaccines codon optimizations can result in differential secondary conformations that inevitably affect a protein’s function with significant consequences to the cell. Importantly, when codon optimization increases the GC content of synthetic mRNAs, there can be an inevitable enrichment of G-quartets which potentially form G-quadruplex structures. The emerging G-quadruplexes are favorable binding sites of RNA binding proteins like helicases that inevitably affect epigenetic reprogramming of the cell by altering transcription, translation and replication. In this study, we performed a RNAfold analysis to investigate alterations in secondary structures of mRNAs in SARS-CoV-2 vaccines due to codon optimization. We show a significant increase in the GC content of mRNAs in vaccines as compared to native SARS-CoV-2 RNA sequences encoding the spike protein. As the GC enrichment leads to more G-quadruplex structure formations, these may contribute to potential pathological processes initiated by SARS-CoV-2 molecular vaccination.


2019 ◽  
Author(s):  
Michael Uhl ◽  
Van Dinh Tran ◽  
Rolf Backofen

AbstractCLIP-seq is the state-of-the-art technique to experimentally determine transcriptome-wide binding sites of RNA-binding proteins (RBPs). However, it relies on gene expression which can be highly variable between conditions, and thus cannot provide a complete picture of the RBP binding landscape. This necessitates the use of computational methods to predict missing binding sites. Here we present GraphProt2, a computational RBP binding site prediction method based on graph convolutional neural networks (GCN). In contrast to current CNN methods, GraphProt2 supports variable length input as well as the possibility to accurately predict nucleotide-wise binding profiles. We demonstrate its superior performance compared to GraphProt and a CNN-based method on single as well as combined CLIP-seq datasets.


2021 ◽  
Author(s):  
Ionut Atanasoai ◽  
Sofia Papavasileiou ◽  
Natalie Preiss ◽  
Claudia Kutter

Over the past decade, thousands of putative human RNA binding proteins (RBPs) have been identified and increased the demand for specifying RNA binding capacities. Here, we developed RNA affinity purification followed by sequencing (RAPseq) that enables in vitro large-scale profiling of RBP binding to native RNAs. First, by employing RAPseq, we found that vertebrate HURs recognize a conserved RNA binding motif and bind predominantly to introns in zebrafish compared to 3'UTRs in human RNAs. Second, our dual RBP assays (co-RAPseq) uncovered cooperative RNA binding of HUR and PTBP1 within an optimal distance of 27 nucleotides. Third, we developed T7-RAPseq to discern m6A-dependent and -independent RNA binding sites of YTHDF1. Fourth, RAPseq of 26 novel non-canonical RBPs revealed specialized moonlighting interactions. Last, five pathological IGF2BP family variants exhibited different RNA binding patterns. Overall, our simple, scalable and versatile method enables to fast-forward RBP-related questions.


2021 ◽  
Author(s):  
Viplove Arora ◽  
Guido Sanguinetti

RNA-binding proteins (RBPs) are key co- and post-transcriptional regulators of gene expression, playing a crucial role in many biological processes. Experimental methods like CLIP-seq have enabled the identification of transcriptome-wide RNA-protein interactions for select proteins, however the time and resource intensive nature of these technologies call for the development of computational methods to complement their predictions. Here we leverage recent, large-scale CLIP-seq experiments to construct a de novo predictor of RNA-protein interactions based on graph neural networks (GNN). We show that the GNN method allows not only to predict missing links in a RNA-protein network, but to predict the entire complement of targets of previously unassayed proteins, and even to reconstruct the entire network of RNA-protein interactions in different conditions based on minimal information. Our results demonstrate the potential of machine learning methods to extract useful information on post-transcriptional regulation from large data sets.


2020 ◽  
Vol 48 (4) ◽  
pp. 1529-1543
Author(s):  
Alessio Colantoni ◽  
Jakob Rupert ◽  
Andrea Vandelli ◽  
Gian Gaetano Tartaglia ◽  
Elsa Zacco

Interactions between proteins and RNA are at the base of numerous cellular regulatory and functional phenomena. The investigation of the biological relevance of non-coding RNAs has led to the identification of numerous novel RNA-binding proteins (RBPs). However, defining the RNA sequences and structures that are selectively recognised by an RBP remains challenging, since these interactions can be transient and highly dynamic, and may be mediated by unstructured regions in the protein, as in the case of many non-canonical RBPs. Numerous experimental and computational methodologies have been developed to predict, identify and verify the binding between a given RBP and potential RNA partners, but navigating across the vast ocean of data can be frustrating and misleading. In this mini-review, we propose a workflow for the identification of the RNA binding partners of putative, newly identified RBPs. The large pool of potential binders selected by in-cell experiments can be enriched by in silico tools such as catRAPID, which is able to predict the RNA sequences more likely to interact with specific RBP regions with high accuracy. The RNA candidates with the highest potential can then be analysed in vitro to determine the binding strength and to precisely identify the binding sites. The results thus obtained can furthermore validate the computational predictions, offering an all-round solution to the issue of finding the most likely RNA binding partners for a newly identified potential RBP.


2021 ◽  
Vol 15 ◽  
Author(s):  
Lichao Zhang ◽  
Zihong Huang ◽  
Liang Kong

Background: RNA-binding proteins establish posttranscriptional gene regulation by coordinating the maturation, editing, transport, stability, and translation of cellular RNAs. The immunoprecipitation experiments could identify interaction between RNA and proteins, but they are limited due to the experimental environment and material. Therefore, it is essential to construct computational models to identify the function sites. Objective: Although some computational methods have been proposed to predict RNA binding sites, the accuracy could be further improved. Moreover, it is necessary to construct a dataset with more samples to design a reliable model. Here we present a computational model based on multi-information sources to identify RNA binding sites. Method: We construct an accurate computational model named CSBPI_Site, based on xtreme gradient boosting. The specifically designed 15-dimensional feature vector captures four types of information (chemical shift, chemical bond, chemical properties and position information). Results: The satisfied accuracy of 0.86 and AUC of 0.89 were obtained by leave-one-out cross validation. Meanwhile, the accuracies were slightly different (range from 0.83 to 0.85) among three classifiers algorithm, which showed the novel features are stable and fit to multiple classifiers. These results showed that the proposed method is effective and robust for noncoding RNA binding sites identification. Conclusion: Our method based on multi-information sources is effective to represent the binding sites information among ncRNAs. The satisfied prediction results of Diels-Alder riboz-yme based on CSBPI_Site indicates that our model is valuable to identify the function site.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2852
Author(s):  
Parvathaneni Naga Srinivasu ◽  
Jalluri Gnana SivaSai ◽  
Muhammad Fazal Ijaz ◽  
Akash Kumar Bhoi ◽  
Wonjoon Kim ◽  
...  

Deep learning models are efficient in learning the features that assist in understanding complex patterns precisely. This study proposed a computerized process of classifying skin disease through deep learning based MobileNet V2 and Long Short Term Memory (LSTM). The MobileNet V2 model proved to be efficient with a better accuracy that can work on lightweight computational devices. The proposed model is efficient in maintaining stateful information for precise predictions. A grey-level co-occurrence matrix is used for assessing the progress of diseased growth. The performance has been compared against other state-of-the-art models such as Fine-Tuned Neural Networks (FTNN), Convolutional Neural Network (CNN), Very Deep Convolutional Networks for Large-Scale Image Recognition developed by Visual Geometry Group (VGG), and convolutional neural network architecture that expanded with few changes. The HAM10000 dataset is used and the proposed method has outperformed other methods with more than 85% accuracy. Its robustness in recognizing the affected region much faster with almost 2× lesser computations than the conventional MobileNet model results in minimal computational efforts. Furthermore, a mobile application is designed for instant and proper action. It helps the patient and dermatologists identify the type of disease from the affected region’s image at the initial stage of the skin disease. These findings suggest that the proposed system can help general practitioners efficiently and effectively diagnose skin conditions, thereby reducing further complications and morbidity.


Sign in / Sign up

Export Citation Format

Share Document