PredLnc-GFStack: A Global Sequence Feature Based on a Stacked Ensemble Learning Method for Predicting lncRNAs from Transcripts

Shuai Liu; Xiaohan Zhao; Guangyan Zhang; Weiyang Li; Feng Liu; Shichao Liu; Wen Zhang

doi:10.3390/genes10090672

PredLnc-GFStack: A Global Sequence Feature Based on a Stacked Ensemble Learning Method for Predicting lncRNAs from Transcripts

Genes ◽

10.3390/genes10090672 ◽

2019 ◽

Vol 10 (9) ◽

pp. 672 ◽

Cited By ~ 2

Author(s):

Shuai Liu ◽

Xiaohan Zhao ◽

Guangyan Zhang ◽

Weiyang Li ◽

Feng Liu ◽

...

Keyword(s):

Ensemble Learning ◽

High Throughput Sequencing ◽

Learning Method ◽

Sequencing Technology ◽

Base Pairs ◽

Feature List ◽

Feature Based ◽

Novel Transcripts ◽

Non Coding Rnas ◽

Candidate Feature

Long non-coding RNAs (lncRNAs) are a class of RNAs with the length exceeding 200 base pairs (bps), which do not encode proteins, nevertheless, lncRNAs have many vital biological functions. A large number of novel transcripts were discovered as a result of the development of high-throughput sequencing technology. Under this circumstance, computational methods for lncRNA prediction are in great demand. In this paper, we consider global sequence features and propose a stacked ensemble learning-based method to predict lncRNAs from transcripts, abbreviated as PredLnc-GFStack. We extract the critical features from the candidate feature list using the genetic algorithm (GA) and then employ the stacked ensemble learning method to construct PredLnc-GFStack model. Computational experimental results show that PredLnc-GFStack outperforms several state-of-the-art methods for lncRNA prediction. Furthermore, PredLnc-GFStack demonstrates an outstanding ability for cross-species ncRNA prediction.

Download Full-text

Identification of microRNAs and relative target genes in Moringa oleifera leaf and callus

Scientific Reports ◽

10.1038/s41598-019-51100-4 ◽

2019 ◽

Vol 9 (1) ◽

Author(s):

Stefano Pirrò ◽

Ivana Matic ◽

Arianna Guidi ◽

Letizia Zanella ◽

Angelo Gismondi ◽

...

Keyword(s):

Stress Response ◽

Moringa Oleifera ◽

High Throughput Sequencing ◽

Target Genes ◽

Pcr Analysis ◽

Microrna Target ◽

Sequencing Technology ◽

Important Addition ◽

Non Coding Rnas ◽

Growth Development

Abstract MicroRNAs, a class of small, non-coding RNAs, play important roles in plant growth, development and stress response by negatively regulating gene expression. Moringa oleifera Lam. plant has many medical and nutritional uses; however, little attention has been dedicated to its potential for the bio production of active compounds. In this study, 431 conserved and 392 novel microRNA families were identified and 9 novel small RNA libraries constructed from leaf, and cold stress treated callus, using high-throughput sequencing technology. Based on the M. oleifera genome, the microRNA repertoire of the seed was re-evaluated. qRT-PCR analysis confirmed the expression pattern of 11 conserved microRNAs in all groups. MicroRNA159 was found to be the most abundant conserved microRNA in leaf and callus, while microRNA393 was most abundantly expressed in the seed. The majority of predicted microRNA target genes were transcriptional factors involved in plant reproduction, growth/development and abiotic/biotic stress response. In conclusion, this is the first comprehensive analysis of microRNAs in M. oleifera leaf and callus which represents an important addition to the existing M. oleifera seed microRNA database and allows for possible exploitation of plant microRNAs induced with abiotic stress, as a tool for bio-enrichment with pharmacologically important phytochemicals.

Download Full-text

Prediction of Long Non-Coding RNAs Based on Deep Learning

Genes ◽

10.3390/genes10040273 ◽

2019 ◽

Vol 10 (4) ◽

pp. 273 ◽

Cited By ~ 6

Author(s):

Xiu-Qin Liu ◽

Bing-Xiu Li ◽

Guan-Rong Zeng ◽

Qiao-Yue Liu ◽

Dong-Mei Ai

Keyword(s):

Deep Learning ◽

High Throughput Sequencing ◽

Short Term Memory ◽

Rapid Development ◽

Classification Performance ◽

Sequencing Technology ◽

Learning Framework ◽

Non Coding Rnas ◽

Set Up ◽

Deep Learning Model

With the rapid development of high-throughput sequencing technology, a large number of transcript sequences have been discovered, and how to identify long non-coding RNAs (lncRNAs) from transcripts is a challenging task. The identification and inclusion of lncRNAs not only can more clearly help us to understand life activities themselves, but can also help humans further explore and study the disease at the molecular level. At present, the detection of lncRNAs mainly includes two forms of calculation and experiment. Due to the limitations of bio sequencing technology and ineluctable errors in sequencing processes, the detection effect of these methods is not very satisfactory. In this paper, we constructed a deep-learning model to effectively distinguish lncRNAs from mRNAs. We used k-mer embedding vectors obtained through training the GloVe algorithm as input features and set up the deep learning framework to include a bidirectional long short-term memory model (BLSTM) layer and a convolutional neural network (CNN) layer with three additional hidden layers. By testing our model, we have found that it obtained the best values of 97.9%, 96.4% and 99.0% in F1score, accuracy and auROC, respectively, which showed better classification performance than the traditional PLEK, CNCI and CPC methods for identifying lncRNAs. We hope that our model will provide effective help in distinguishing mature mRNAs from lncRNAs, and become a potential tool to help humans understand and detect the diseases associated with lncRNAs.

Download Full-text

Clinical Application of High-throughput Sequencing Technology for the Diagnosis of Patients With Severe Infection

Case Medical Research ◽

10.31525/ct1-nct04217252 ◽

2020 ◽

Author(s):

Keyword(s):

Clinical Application ◽

High Throughput ◽

High Throughput Sequencing ◽

Severe Infection ◽

Sequencing Technology

Download Full-text

Protein Remote Homology Detection by Combining Pseudo Dimer Composition with an Ensemble Learning Method

Current Proteomics ◽

10.2174/157016461302160514002939 ◽

2016 ◽

Vol 13 (2) ◽

pp. 86-91 ◽

Cited By ~ 7

Author(s):

Bin Liu ◽

Junjie Chen ◽

Shanyi Wang

Keyword(s):

Ensemble Learning ◽

Learning Method ◽

Homology Detection ◽

Remote Homology ◽

Remote Homology Detection

Download Full-text

A Deep Learning based Ensemble Learning Method for Epileptic Seizure Prediction

Computers in Biology and Medicine ◽

10.1016/j.compbiomed.2021.104710 ◽

2021 ◽

pp. 104710

Author(s):

Syed Muhammad Usman ◽

Shehzad Khalid ◽

Sadaf Bashir

Keyword(s):

Deep Learning ◽

Ensemble Learning ◽

Epileptic Seizure ◽

Learning Method ◽

Seizure Prediction ◽

Epileptic Seizure Prediction

Download Full-text

Identification of sex differentiation-related microRNA and long non-coding RNA in Takifugu rubripes gonads

Scientific Reports ◽

10.1038/s41598-021-83891-w ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Hongwei Yan ◽

Qi Liu ◽

Jieming Jiang ◽

Xufang Shen ◽

Lei Zhang ◽

...

Keyword(s):

Sex Determination ◽

High Throughput Sequencing ◽

Sex Differentiation ◽

Mature Mirnas ◽

Takifugu Rubripes ◽

Sequencing Technology ◽

Non Coding Rna ◽

Differentially Expressed Mirnas ◽

Tiger Pufferfish ◽

Long Non Coding Rna

AbstractAlthough sex determination and differentiation are key developmental processes in animals, the involvement of non-coding RNA in the regulation of this process is still not clarified. The tiger pufferfish (Takifugu rubripes) is one of the most economically important marine cultured species in Asia, but analyses of miRNA and long non-coding RNA (lncRNA) at early sex differentiation stages have not been conducted yet. In our study, high-throughput sequencing technology was used to sequence transcriptome libraries from undifferentiated gonads of T. rubripes. In total, 231 (107 conserved, and 124 novel) miRNAs were obtained, while 2774 (523 conserved, and 2251 novel) lncRNAs were identified. Of these, several miRNAs and lncRNAs were predicted to be the regulators of the expression of sex-related genes (including fru-miR-15b/foxl2, novel-167, novel-318, and novel-538/dmrt1, novel-548/amh, lnc_000338, lnc_000690, lnc_000370, XLOC_021951, and XR_965485.1/gsdf). Analysis of differentially expressed miRNAs and lncRNAs showed that three mature miRNAs up-regulated and five mature miRNAs were down-regulated in male gonads compared to female gonads, while 79 lncRNAs were up-regulated and 51 were down-regulated. These findings could highlight a group of interesting miRNAs and lncRNAs for future studies and may reveal new insights into the function of miRNAs and lncRNAs in sex determination and differentiation.

Download Full-text

Research Progress of Soil Microorganism Application Based on High-Throughput Sequencing Technology

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/692/4/042059 ◽

2021 ◽

Vol 692 (4) ◽

pp. 042059

Author(s):

Yujun Zhang ◽

Puchang Wang ◽

Zhongfu Long ◽

Leilei Ding ◽

Wen Zhang ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Soil Microorganism ◽

Research Progress ◽

Sequencing Technology

Download Full-text

Comparative profiling of the resistance of different genotypes of mannose-binding lectin to Mycoplasma pneumoniae infection in Chinese Merino sheep based on high-throughput sequencing technology

Veterinary Immunology and Immunopathology ◽

10.1016/j.vetimm.2021.110183 ◽

2021 ◽

pp. 110183

Author(s):

Mengting Zhu ◽

Ying Nan ◽

Mengting Zhai ◽

Mingyuan Wang ◽

Yanyan Shao ◽

...

Keyword(s):

Mycoplasma Pneumoniae ◽

High Throughput ◽

High Throughput Sequencing ◽

Mannose Binding Lectin ◽

Sequencing Technology ◽

Merino Sheep ◽

Mannose Binding ◽

Binding Lectin ◽

Mycoplasma Pneumoniae Infection

Download Full-text

Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm

Bioinformatics ◽

10.1093/bioinformatics/btaa179 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3669-3679 ◽

Cited By ~ 3

Author(s):

Can Firtina ◽

Jeremie S Kim ◽

Mohammed Alser ◽

Damla Senol Cali ◽

A Ercument Cicek ◽

...

Keyword(s):

Genome Analysis ◽

Supplementary Information ◽

Third Generation ◽

Sequencing Technology ◽

Base Pairs ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Long Reads ◽

Generation Sequencing ◽

Large Genomes

Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. Results We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Availability and implementation Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Identification of the new gene Zrsr1 to associate with the pluripotency state in induced pluripotent stem cells (iPSCs) using high throughput sequencing technology

Genomics Data ◽

10.1016/j.gdata.2014.04.008 ◽

2014 ◽

Vol 2 ◽

pp. 73-77 ◽

Cited By ~ 2

Author(s):

Shuai Gao ◽

Gang Chang ◽

Jianhui Tian ◽

Shaorong Gao ◽

Tao Cai

Keyword(s):

Stem Cells ◽

Induced Pluripotent Stem Cells ◽

High Throughput ◽

Pluripotent Stem Cells ◽

High Throughput Sequencing ◽

Sequencing Technology ◽

New Gene ◽

Induced Pluripotent

Download Full-text