A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model

Mickael Orgeur; Marvin Martens; Stefan T. Börno; Bernd Timmermann; Delphine Duprez; Sigmar Stricker

doi:10.1242/bio.028498

A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model

Biology Open ◽

10.1242/bio.028498 ◽

2017 ◽

Vol 7 (1) ◽

pp. bio028498 ◽

Cited By ~ 3

Author(s):

Mickael Orgeur ◽

Marvin Martens ◽

Stefan T. Börno ◽

Bernd Timmermann ◽

Delphine Duprez ◽

...

Keyword(s):

Rna Seq ◽

Transcript Discovery ◽

Chicken Model

Download Full-text

A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model

10.1101/156406 ◽

2017 ◽

Cited By ~ 2

Author(s):

Mickael Orgeur ◽

Marvin Martens ◽

Stefan T. Börno ◽

Bernd Timmermann ◽

Delphine Duprez ◽

...

Keyword(s):

Genome Sequence ◽

De Novo ◽

Gene Annotation ◽

Transcriptome Assembly ◽

Draft Genome ◽

Transcript Abundance ◽

Accurate Estimation ◽

Rna Seq ◽

A Genome ◽

Transcript Discovery

AbstractThe sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (RNA-seq) data relies on read quality, library complexity and expression normalization. In addition, the quality of the genome sequence used to map sequencing reads and the gene annotation that defines gene features must also be taken into account. Partially covered genome sequence causes the loss of sequencing reads from the mapping step, while an inaccurate definition of gene features induces imprecise read counts from the assignment step. Both steps can significantly bias interpretation of RNA-seq data. Here, we describe a dual transcript-discovery approach combining a genome-guided gene prediction and ade novotranscriptome assembly. This dual approach enabled us to increase the assignment rate of RNA-seq data by nearly 20% as compared to when using only the chicken reference annotation, contributing therefore to a more accurate estimation of transcript abundance. More generally, this strategy could be applied to any organism with partial genome sequence and/or lacking a manually-curated reference annotation in order to improve the accuracy of gene expression studies.

Download Full-text

Asymmetric Transcript Discovery by RNA-seq in C. elegans Blastomeres Identifies neg-1, a Gene Important for Anterior Morphogenesis

PLoS Genetics ◽

10.1371/journal.pgen.1005117 ◽

2015 ◽

Vol 11 (4) ◽

pp. e1005117 ◽

Cited By ~ 11

Author(s):

Erin Osborne Nishimura ◽

Jay C. Zhang ◽

Adam D. Werts ◽

Bob Goldstein ◽

Jason D. Lieb

Keyword(s):

Rna Seq ◽

C Elegans ◽

Transcript Discovery

Download Full-text

Heuristic pairwise alignment of de Bruijn graphs to facilitate simultaneous transcript discovery in related organisms from RNA-Seq data

2014 IEEE 4th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS) ◽

10.1109/iccabs.2014.6863942 ◽

2014 ◽

Author(s):

Shuhua Fu ◽

Aaron M. Tarone ◽

Sing-Hoi Sze

Keyword(s):

Pairwise Alignment ◽

Rna Seq ◽

De Bruijn Graphs ◽

De Bruijn ◽

Transcript Discovery

Download Full-text

Heuristic pairwise alignment of de Bruijn graphs to facilitate simultaneous transcript discovery in related organisms from RNA-Seq data

BMC Genomics ◽

10.1186/1471-2164-16-s11-s5 ◽

2015 ◽

Vol 16 (Suppl 11) ◽

pp. S5 ◽

Cited By ~ 3

Author(s):

Shuhua Fu ◽

Aaron M Tarone ◽

Sing-Hoi Sze

Keyword(s):

Pairwise Alignment ◽

Rna Seq ◽

De Bruijn Graphs ◽

De Bruijn ◽

Transcript Discovery

Download Full-text

PRAM: a novel pooling approach for discovering intergenic transcripts from large-scale RNA sequencing experiments

10.1101/636282 ◽

2019 ◽

Author(s):

Peng Liu ◽

Alexandra A. Soukup ◽

Emery H. Bresnick ◽

Colin N. Dewey ◽

Sündüz Keleş

Keyword(s):

Large Scale ◽

Rna Seq ◽

Neighboring Gene ◽

Novel Transcript ◽

Differential Expression Pattern ◽

New Biology ◽

Joint Examination ◽

Splice Junctions ◽

Transcript Discovery

AbstractPublicly available RNA-seq data is routinely used for retrospective analysis to elucidate new biology. Novel transcript discovery enabled by joint examination of large collections of RNA-seq datasets has emerged as one such analysis. Current methods for transcript discovery rely on a ‘2-Step’ approach where the first step encompasses building transcripts from individual datasets, followed by the second step that merges predicted transcripts across datasets. To increase the power of transcript discovery from large collections of RNA-seq datasets, we developed a novel ‘1-Step’ approach named Pooling RNA-seq and Assembling Models (PRAM) that builds transcript models from pooled RNA-seq datasets. We demonstrate in a computational benchmark that ‘1-Step’ outperforms ‘2-Step’ approaches in predicting overall transcript structures and individual splice junctions, while performing competitively in detecting exonic nucleotides. Applying PRAM to 30 human ENCODE RNA-seq datasets identified unannotated transcripts with epigenetic and RAMPAGE signatures similar to those of recently annotated transcripts. In a case study, we discovered and experimentally validated new transcripts through the application of PRAM to mouse hematopoietic RNA-seq datasets. Notably, we uncovered new transcripts that share a differential expression pattern with a neighboring genePik3cgimplicated in human hematopoietic phenotypes, and we provided evidence for the conservation of this relationship in human. PRAM is implemented as an R/Bioconductor package and is available athttps://bioconductor.org/packages/pram.

Download Full-text