scholarly journals PredLnc-GFStack: A Global Sequence Feature Based on a Stacked Ensemble Learning Method for Predicting lncRNAs from Transcripts

Genes ◽  
2019 ◽  
Vol 10 (9) ◽  
pp. 672 ◽  
Author(s):  
Shuai Liu ◽  
Xiaohan Zhao ◽  
Guangyan Zhang ◽  
Weiyang Li ◽  
Feng Liu ◽  
...  

Long non-coding RNAs (lncRNAs) are a class of RNAs with the length exceeding 200 base pairs (bps), which do not encode proteins, nevertheless, lncRNAs have many vital biological functions. A large number of novel transcripts were discovered as a result of the development of high-throughput sequencing technology. Under this circumstance, computational methods for lncRNA prediction are in great demand. In this paper, we consider global sequence features and propose a stacked ensemble learning-based method to predict lncRNAs from transcripts, abbreviated as PredLnc-GFStack. We extract the critical features from the candidate feature list using the genetic algorithm (GA) and then employ the stacked ensemble learning method to construct PredLnc-GFStack model. Computational experimental results show that PredLnc-GFStack outperforms several state-of-the-art methods for lncRNA prediction. Furthermore, PredLnc-GFStack demonstrates an outstanding ability for cross-species ncRNA prediction.

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Stefano Pirrò ◽  
Ivana Matic ◽  
Arianna Guidi ◽  
Letizia Zanella ◽  
Angelo Gismondi ◽  
...  

Abstract MicroRNAs, a class of small, non-coding RNAs, play important roles in plant growth, development and stress response by negatively regulating gene expression. Moringa oleifera Lam. plant has many medical and nutritional uses; however, little attention has been dedicated to its potential for the bio production of active compounds. In this study, 431 conserved and 392 novel microRNA families were identified and 9 novel small RNA libraries constructed from leaf, and cold stress treated callus, using high-throughput sequencing technology. Based on the M. oleifera genome, the microRNA repertoire of the seed was re-evaluated. qRT-PCR analysis confirmed the expression pattern of 11 conserved microRNAs in all groups. MicroRNA159 was found to be the most abundant conserved microRNA in leaf and callus, while microRNA393 was most abundantly expressed in the seed. The majority of predicted microRNA target genes were transcriptional factors involved in plant reproduction, growth/development and abiotic/biotic stress response. In conclusion, this is the first comprehensive analysis of microRNAs in M. oleifera leaf and callus which represents an important addition to the existing M. oleifera seed microRNA database and allows for possible exploitation of plant microRNAs induced with abiotic stress, as a tool for bio-enrichment with pharmacologically important phytochemicals.


Genes ◽  
2019 ◽  
Vol 10 (4) ◽  
pp. 273 ◽  
Author(s):  
Xiu-Qin Liu ◽  
Bing-Xiu Li ◽  
Guan-Rong Zeng ◽  
Qiao-Yue Liu ◽  
Dong-Mei Ai

With the rapid development of high-throughput sequencing technology, a large number of transcript sequences have been discovered, and how to identify long non-coding RNAs (lncRNAs) from transcripts is a challenging task. The identification and inclusion of lncRNAs not only can more clearly help us to understand life activities themselves, but can also help humans further explore and study the disease at the molecular level. At present, the detection of lncRNAs mainly includes two forms of calculation and experiment. Due to the limitations of bio sequencing technology and ineluctable errors in sequencing processes, the detection effect of these methods is not very satisfactory. In this paper, we constructed a deep-learning model to effectively distinguish lncRNAs from mRNAs. We used k-mer embedding vectors obtained through training the GloVe algorithm as input features and set up the deep learning framework to include a bidirectional long short-term memory model (BLSTM) layer and a convolutional neural network (CNN) layer with three additional hidden layers. By testing our model, we have found that it obtained the best values of 97.9%, 96.4% and 99.0% in F1score, accuracy and auROC, respectively, which showed better classification performance than the traditional PLEK, CNCI and CPC methods for identifying lncRNAs. We hope that our model will provide effective help in distinguishing mature mRNAs from lncRNAs, and become a potential tool to help humans understand and detect the diseases associated with lncRNAs.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hongwei Yan ◽  
Qi Liu ◽  
Jieming Jiang ◽  
Xufang Shen ◽  
Lei Zhang ◽  
...  

AbstractAlthough sex determination and differentiation are key developmental processes in animals, the involvement of non-coding RNA in the regulation of this process is still not clarified. The tiger pufferfish (Takifugu rubripes) is one of the most economically important marine cultured species in Asia, but analyses of miRNA and long non-coding RNA (lncRNA) at early sex differentiation stages have not been conducted yet. In our study, high-throughput sequencing technology was used to sequence transcriptome libraries from undifferentiated gonads of T. rubripes. In total, 231 (107 conserved, and 124 novel) miRNAs were obtained, while 2774 (523 conserved, and 2251 novel) lncRNAs were identified. Of these, several miRNAs and lncRNAs were predicted to be the regulators of the expression of sex-related genes (including fru-miR-15b/foxl2, novel-167, novel-318, and novel-538/dmrt1, novel-548/amh, lnc_000338, lnc_000690, lnc_000370, XLOC_021951, and XR_965485.1/gsdf). Analysis of differentially expressed miRNAs and lncRNAs showed that three mature miRNAs up-regulated and five mature miRNAs were down-regulated in male gonads compared to female gonads, while 79 lncRNAs were up-regulated and 51 were down-regulated. These findings could highlight a group of interesting miRNAs and lncRNAs for future studies and may reveal new insights into the function of miRNAs and lncRNAs in sex determination and differentiation.


2021 ◽  
Vol 692 (4) ◽  
pp. 042059
Author(s):  
Yujun Zhang ◽  
Puchang Wang ◽  
Zhongfu Long ◽  
Leilei Ding ◽  
Wen Zhang ◽  
...  

2020 ◽  
Vol 36 (12) ◽  
pp. 3669-3679 ◽  
Author(s):  
Can Firtina ◽  
Jeremie S Kim ◽  
Mohammed Alser ◽  
Damla Senol Cali ◽  
A Ercument Cicek ◽  
...  

Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. Results We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Availability and implementation Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document