scholarly journals Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs

2021 ◽  
Vol 22 (16) ◽  
pp. 8719
Author(s):  
Muhammad Nabeel Asim ◽  
Muhammad Ali Ibrahim ◽  
Muhammad Imran Malik ◽  
Andreas Dengel ◽  
Sheraz Ahmed

Apart from protein-coding Ribonucleic acids (RNAs), there exists a variety of non-coding RNAs (ncRNAs) which regulate complex cellular and molecular processes. High-throughput sequencing technologies and bioinformatics approaches have largely promoted the exploration of ncRNAs which revealed their crucial roles in gene regulation, miRNA binding, protein interactions, and splicing. Furthermore, ncRNAs are involved in the development of complicated diseases like cancer. Categorization of ncRNAs is essential to understand the mechanisms of diseases and to develop effective treatments. Sub-cellular localization information of ncRNAs demystifies diverse functionalities of ncRNAs. To date, several computational methodologies have been proposed to precisely identify the class as well as sub-cellular localization patterns of RNAs). This paper discusses different types of ncRNAs, reviews computational approaches proposed in the last 10 years to distinguish coding-RNA from ncRNA, to identify sub-types of ncRNAs such as piwi-associated RNA, micro RNA, long ncRNA, and circular RNA, and to determine sub-cellular localization of distinct ncRNAs and RNAs. Furthermore, it summarizes diverse ncRNA classification and sub-cellular localization determination datasets along with benchmark performance to aid the development and evaluation of novel computational methodologies. It identifies research gaps, heterogeneity, and challenges in the development of computational approaches for RNA sequence analysis. We consider that our expert analysis will assist Artificial Intelligence researchers with knowing state-of-the-art performance, model selection for various tasks on one platform, dominantly used sequence descriptors, neural architectures, and interpreting inter-species and intra-species performance deviation.

2019 ◽  
Author(s):  
Antonio P. Camargo ◽  
Vsevolod Sourkov ◽  
Marcelo F. Carazzolle

AbstractMotivationThe advent of high-throughput sequencing technologies made it possible to obtain large volumes of genetic information, quickly and inexpensively. Thus, many efforts are devoted to unveil the biological roles of genomic elements, being one of the main tasks the identification of protein-coding and long non-coding RNAs.ResultsWe describe RNAsamba, a tool to predict the coding potential of RNA molecules from sequence information using a deep-learning model that processes both the whole sequence and the ORF to look for patterns that distinguish coding and non-coding RNAs. We evaluated the model in the classification of coding and non-coding transcripts of humans and five other model organisms and show that RNAsamba mostly outperforms other state-of-the-art methods. We also show that RNAsamba can identify coding signals in partial-length ORFs and UTR sequences, evidencing that its model is not dependent on the presence of complete coding regions. RNAsamba is a fast and easy tool that can provide valuable contributions to genome annotation pipelines.Availability and implementationThe source code of RNAsamba is freely available at:https://github.com/apcamargo/RNAsamba.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Garima Bhatia ◽  
Santosh K. Upadhyay ◽  
Anuradha Upadhyay ◽  
Kashmir Singh

Abstract Background Long non-coding RNAs (lncRNAs) are regulatory transcripts of length > 200 nt. Owing to the rapidly progressing RNA-sequencing technologies, lncRNAs are emerging as considerable nodes in the plant antifungal defense networks. Therefore, we investigated their role in Vitis vinifera (grapevine) in response to obligate biotrophic fungal phytopathogens, Erysiphe necator (powdery mildew, PM) and Plasmopara viticola (downy mildew, DM), which impose huge agro-economic burden on grape-growers worldwide. Results Using computational approach based on RNA-seq data, 71 PM- and 83 DM-responsive V. vinifera lncRNAs were identified and comprehensively examined for their putative functional roles in plant defense response. V. vinifera protein coding sequences (CDS) were also profiled based on expression levels, and 1037 PM-responsive and 670 DM-responsive CDS were identified. Next, co-expression analysis-based functional annotation revealed their association with gene ontology (GO) terms for ‘response to stress’, ‘response to biotic stimulus’, ‘immune system process’, etc. Further investigation based on analysis of domains, enzyme classification, pathways enrichment, transcription factors (TFs), interactions with microRNAs (miRNAs), and real-time quantitative PCR of lncRNAs and co-expressing CDS pairs suggested their involvement in modulation of basal and specific defense responses such as: Ca2+-dependent signaling, cell wall reinforcement, reactive oxygen species metabolism, pathogenesis related proteins accumulation, phytohormonal signal transduction, and secondary metabolism. Conclusions Overall, the identified lncRNAs provide insights into the underlying intricacy of grapevine transcriptional reprogramming/post-transcriptional regulation to delay or seize the living cell-dependent pathogen growth. Therefore, in addition to defense-responsive genes such as TFs, the identified lncRNAs can be further examined and leveraged to candidates for biotechnological improvement/breeding to enhance fungal stress resistance in this susceptible fruit crop of economic and nutritional importance.


2021 ◽  
Vol 11 ◽  
Author(s):  
Soudeh Ghafouri-Fard ◽  
Tayyebeh Khoshbakht ◽  
Mohammad Taheri ◽  
Elena Jamali

Circular RNAs (circRNAs) are a group of long non-coding RNAs with enclosed structure generated by back-splicing events. Numerous members of these transcripts have been shown to affect carcinogenesis. Circular RNA itchy E3 ubiquitin protein ligase (circITCH) is a circRNA created from back splicing events in ITCH gene, a protein coding gene on 20q11.22 region. ITCH has a role as a catalyzer for ubiquitination through both proteolytic and non-proteolytic routes. CircITCH is involved in the pathetiology of cancers through regulation of the linear isoform as well as serving as sponge for several microRNAs, namely miR-17, miR-224, miR-214, miR-93-5p, miR-22, miR-7, miR-106a, miR-10a, miR-145, miR-421, miR-224-5p, miR-197 and miR-199a-5p. CircITCH is also involved in the modulation of Wnt/β-catenin and PTEN/PI3K/AKT pathways. Except from a single study in osteosarcoma, circITCH has been found to exert tumor suppressor role in diverse cancers. In the present manuscript, we provided a comprehensive review of investigations that reported function of circITCH in the carcinogenesis.


Cancers ◽  
2020 ◽  
Vol 12 (2) ◽  
pp. 351 ◽  
Author(s):  
Kirti S. Prabhu ◽  
Afsheen Raza ◽  
Thasni Karedath ◽  
Syed Shadab Raza ◽  
Hamna Fathima ◽  
...  

Breast cancer is regarded as a heterogeneous and complicated disease that remains the prime focus in the domain of public health concern. Next-generation sequencing technologies provided a new perspective dimension to non-coding RNAs, which were initially considered to be transcriptional noise or a product generated from erroneous transcription. Even though understanding of biological and molecular functions of noncoding RNA remains enigmatic, researchers have established the pivotal role of these RNAs in governing a plethora of biological phenomena that includes cancer-associated cellular processes such as proliferation, invasion, migration, apoptosis, and stemness. In addition to this, the transmission of microRNAs and long non-coding RNAs was identified as a source of communication to breast cancer cells either locally or systemically. The present review provides in-depth information with an aim at discovering the fundamental potential of non-coding RNAs, by providing knowledge of biogenesis and functional roles of micro RNA and long non-coding RNAs in breast cancer and breast cancer stem cells, as either oncogenic drivers or tumor suppressors. Furthermore, non-coding RNAs and their potential role as diagnostic and therapeutic moieties have also been summarized.


2021 ◽  
Vol 21 ◽  
Author(s):  
Han Yu ◽  
Zi-Ang Shen ◽  
Yuan-Ke Zhou ◽  
Pu-Feng Du

: Long non-coding RNAs (LncRNAs) are a type of RNA with little or no protein-coding ability. Their length is more than 200 nucleotides. A large number of studies have indicated that lncRNAs play a significant role in various biological processes, including chromatin organizations, epigenetic programmings, transcriptional regulations, post-transcriptional processing, and circadian mechanism at the cellular level. Since lncRNAs perform vast functions through their interactions with proteins, identifying lncRNA-protein interaction is crucial to the understandings of the lncRNA molecular functions. However, due to the high cost and time-consuming disadvantage of experimental methods, a variety of computational methods have emerged. Recently, many effective and novel machine learning methods have been developed. In general, these methods fall into two categories: semi-supervised learning methods and supervised learning methods. The latter category can be further classified into the deep learning-based method, the ensemble learning-based method, and the hybrid method. In this paper, we focused on supervised learning methods. We summarized the state-of-the-art methods in predicting lncRNA-protein interactions. Furthermore, the performance and the characteristics of different methods have also been compared in this work. Considering the limits of the existing models, we analyzed the problems and discussed future research potentials.


2018 ◽  
Author(s):  
Lixin Cheng ◽  
Kwong-Sak Leung

AbstractMoonlighting proteins are a class of proteins having multiple distinct functions, which play essential roles in a variety of cellular and enzymatic functioning systems. Although there have long been calls for computational algorithms for the identification of moonlighting proteins, research on approaches to identify moonlighting long non-coding RNAs (lncRNAs) has never been undertaken. Here, we introduce a methodology, MoonFinder, for the identification of moonlighting lncRNAs. MoonFinder is a statistical algorithm identifying moonlighting lncRNAs without a priori knowledge through the integration of protein interactome, RNA-protein interactions, and functional annotation of proteins. We identify 155 moonlighting lncRNA candidates and uncover that they are a distinct class of lncRNAs characterized by specific sequence and cellular localization features. The non-coding genes that transcript moonlighting lncRNAs tend to have shorter but more exons and the moonlighting lncRNAs have a localization tendency of residing in the cytoplasmic compartment in comparison with the nuclear compartment. Moreover, moonlighting lncRNAs and moonlighting proteins are rather mutually exclusive in terms of both their direct interactions and interacting partners. Our results also shed light on how the moonlighting candidates and their interacting proteins implicated in the formation and development of cancers and other diseases.


2021 ◽  
Author(s):  
Mohan V Kasukurthi ◽  
Dominika Houserova ◽  
Yulong Huang ◽  
Addison A. Barchie ◽  
Justin T. Roberts ◽  
...  

ABSTRACTThe widespread utilization of high-throughput sequencing technologies has unequivocally demonstrated that eukaryotic transcriptomes consist primarily (>98%) of non-coding RNA (ncRNA) transcripts significantly more diverse than their protein-coding counterparts.ncRNAs are typically divided into two categories based on their length. (1) ncRNAs less than 200 nucleotides (nt) long are referred as small non-coding RNAs (sncRNAs) and include microRNAs (miRNAs), piwi-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs), transfer ribonucleic RNAs (tRNAs), etc., and the majority of these are thought to function primarily in controlling gene expression. That said, the full repertoire of sncRNAs remains fairly poorly defined as evidenced by two entirely new classes of sncRNAs only recently being reported, i.e., snoRNA-derived RNAs (sdRNAs) and tRNA-derived fragments (tRFs). (2) ncRNAs longer than 200 nt long are known as long ncRNAs (lncRNAs). lncRNAs represent the 2nd largest transcriptional output of the cell (behind only ribosomal RNAs), and although functional roles for several lncRNAs have been reported, most lncRNAs remain largely uncharacterized due to a lack of predictive tools aimed at guiding functional characterizations.Importantly, whereas the cost of high-throughput transcriptome sequencing is now feasible for most active research programs, tools necessary for the interpretation of these sequencings typically require significant computational expertise and resources markedly hindering widespread utilization of these datasets. In light of this, we have developed a powerful new ncRNA transcriptomics suite, SALTS, which is highly accurate, markedly efficient, and extremely user-friendly. SALTS stands for SURFR (sncRNA) And LAGOOn (lncRNA) Transcriptomics Suite and offers platforms for comprehensive sncRNA and lncRNA profiling and discovery, ncRNA functional prediction, and the identification of significant differential expressions among datasets. Notably, SALTS is accessed through an intuitive Web-based interface, can be used to analyze either user-generated, standard next-generation sequencing (NGS) output file uploads (e.g., FASTQ) or existing NCBI Sequence Read Archive (SRA) data, and requires absolutely no dataset pre-processing or knowledge of library adapters/oligonucleotides.SALTS constitutes the first publically available, Web-based, comprehensive ncRNA transcriptomic NGS analysis platform designed specifically for users with no computational background, providing a much needed, powerful new resource capable of enabling more widespread ncRNA transcriptomic analyses. The SALTS WebServer is freely available online at http://salts.soc.southalabama.edu.


Author(s):  
Zheguang Lin ◽  
Yibing Liu ◽  
Xiaomei Chen ◽  
Cong Han ◽  
Wei Wang ◽  
...  

AbstractLong non-coding RNAs (lncRNAs) emerge as critical regulators with various biological functions in living organisms. However, to date, no systematic characterization of lncRNAs has been investigated in the ectoparasitic mite Varroa destructor, the most severe biotic threat to honey bees worldwide. Here, we performed an initial genome-wide identification of lncRNAs in V. destructor via high-throughput sequencing technology and reported, for the first time, the transcriptomic landscape of lncRNAs in the devastating parasite. By means of a lncRNA identification pipeline, 6,645 novel lncRNA transcripts, encoded by 3,897 gene loci, were identified, including 2,066 sense lncRNAs, 2,772 lincRNAs, and 1,807 lncNATs. Compared with protein-coding mRNAs, V. destructor lncRNAs are shorter in terms of full length, as well as of the ORF length, contain less exons, and express at lower level. GO term and KEGG pathway enrichment analyses of the lncRNA target genes demonstrated that these predicted lncRNAs are likely to play key roles in cellular processes, genetic information processing and environmental responses. To our knowledge, this is the first catalog of lncRNA profile in the parasitiformes species, providing a valuable resource for genetic and genomic studies. Understanding the characteristics and features of lncRNAs in V. destructor would promote sustainable pest control.


2020 ◽  
Vol 48 (4) ◽  
pp. 1545-1556 ◽  
Author(s):  
Qianpeng Li ◽  
Zhao Li ◽  
Changrui Feng ◽  
Shuai Jiang ◽  
Zhang Zhang ◽  
...  

LncRNAs (long non-coding RNAs) are pervasively transcribed in the human genome and also extensively involved in a variety of essential biological processes and human diseases. The comprehensive annotation of human lncRNAs is of great significance in navigating the functional landscape of the human genome and deepening the understanding of the multi-featured RNA world. However, the unique characteristics of lncRNAs as well as their enormous quantity have complicated and challenged the annotation of lncRNAs. Advances in high-throughput sequencing technologies give rise to a large volume of omics data that are generated at an unprecedented rate and scale, providing possibilities in the identification, characterization and functional annotation of lncRNAs. Here, we review the recent important discoveries of human lncRNAs through analysis of various omics data and summarize specialized lncRNA database resources. Moreover, we highlight the multi-omics integrative analysis as a powerful strategy to efficiently discover and characterize the functional lncRNAs and elucidate their potential molecular mechanisms.


2019 ◽  
Vol 16 (3) ◽  
Author(s):  
Peijing Zhang ◽  
Wenyi Wu ◽  
Qi Chen ◽  
Ming Chen

AbstractEukaryotic genomes are pervasively transcribed. Besides protein-coding RNAs, there are different types of non-coding RNAs that modulate complex molecular and cellular processes. RNA sequencing technologies and bioinformatics methods greatly promoted the study of ncRNAs, which revealed ncRNAs’ essential roles in diverse aspects of biological functions. As important key players in gene regulatory networks, ncRNAs work with other biomolecules, including coding and non-coding RNAs, DNAs and proteins. In this review, we discuss the distinct types of ncRNAs, including housekeeping ncRNAs and regulatory ncRNAs, their versatile functions and interactions, transcription, translation, and modification. Moreover, we summarize the integrated networks of ncRNA interactions, providing a comprehensive landscape of ncRNAs regulatory roles.


Sign in / Sign up

Export Citation Format

Share Document