A comprehensive online database for exploring ~20,000 public Arabidopsis RNA-Seq libraries

Mapping Intimacies ◽

10.1101/844522 ◽

2019 ◽

Author(s):

Hong Zhang ◽

Fei Zhang ◽

Li Feng ◽

Jinbu Jia ◽

Jixian Zhai

Keyword(s):

Transcriptional Regulation ◽

Transcriptome Profiling ◽

Research Community ◽

Online Database ◽

Rna Seq ◽

Huge Amount ◽

Genome Wide ◽

Wide Scale ◽

Computational Resources ◽

User Friendly

AbstractApplication of Next Generating Sequencing (NGS) technology in transcriptome profiling has greatly improved our understanding of transcriptional regulation at genome-wide scale in the last decade, and tens of thousands of RNA-sequencing (RNA-seq) libraries have been produced by the research community. However, accessing such huge amount of RNA-seq data poses a big challenge for groups that lack dedicated bioinformatic personnel or expensive computational resources. Here, we introduce the Arabidopsis RNA-seq database (ARS), a free, web-accessible, and user-friendly to quickly explore expression level of any gene in 20,000+ publicly available Arabidopsis RNA-seq libraries.

Download Full-text

Comparative Transcriptome Profiling of High and Low oil yielding Santalum album L.

10.1101/2021.05.12.443750 ◽

2021 ◽

Author(s):

Tanzeem Fatima ◽

Rangachari Krishnan ◽

Ashutosh Srivastava ◽

Vageeshbabu S. Hanur ◽

M. Srinivasa Rao

Keyword(s):

Transcriptome Profiling ◽

Santalum Album ◽

Rna Seq ◽

Improvement Program ◽

East Indian ◽

Kegg Pathways ◽

Oil Biosynthesis ◽

Genome Wide ◽

A Genome ◽

Santalum Album L

East Indian Sandalwood (Santalum album L.) is highly valued for its heartwood and its oil. There have been no efforts to comparative study of high and low oil yielding genetically identical sandalwood trees grown in similar climatic condition. Thus we intend to study a genome wide transcriptome analysis to identify the corresponding genes involved in high oil biosynthesis in S. album. In this study, 15 years old S. album (SaSHc and SaSLc) genotypes were targeted for analysis to understand the contribution of genetic background on high oil biosynthesis in S. album. A total of 28,959187 and 25,598869 raw PE reads were generated by the Illumina sequencing. 2.12 million and 1.811 million coding sequences were obtained in respective accessions. Based on the GO terms, functional classification of the CDS 21262, & 18113 were assigned into 26 functional groups of three GO categories; (4,168; 3,641) for biological process (5,758;4,971) cellular component and (5,108;4,441) for molecular functions. Total 41,900 and 36,571 genes were functionally annotated and KEGG pathways of the DEGs resulted 213 metabolic pathways. In this, 14 pathways were involved in secondary metabolites biosynthesis pathway in S. album. Among 237 cytochrome families, nine groups of cytochromes were participated in high oil biosynthesis. 16,665 differentially expressed genes were commonly detected in both the accessions (SaHc and SaSLc). The results showed that 784 genes were upregulated and 339 genes were downregulated in SaHc whilst 635 upregulated 299 downregulated in SaSLc S. album. RNA-Seq results were further validated by quantitative RT-PCR. Maximum Blast hits were found to be against Vitis vinifera. From this study we have identified additional number of cytochrome family in SaHc. The accessibility of a RNA-Seq for high oil yielding sandalwood accessions will have broader associations for the conservation and selection of superior elite samples/populations for further genetic improvement program.

Download Full-text

Generation and Transcriptome Profiling of Slr1-d7 and Slr1-d8 Mutant Lines with a New Semi-Dominant Dwarf Allele of SLR1 Using the CRISPR/Cas9 System in Rice

International Journal of Molecular Sciences ◽

10.3390/ijms21155492 ◽

2020 ◽

Vol 21 (15) ◽

pp. 5492 ◽

Cited By ~ 1

Author(s):

Yu Jin Jung ◽

Jong Hee Kim ◽

Hyo Ju Lee ◽

Dong Hyun Kim ◽

Jihyeon Yu ◽

...

Keyword(s):

Gene Expression Analysis ◽

Transcriptome Profiling ◽

Rna Seq ◽

Loss Of Function ◽

Amino Acid Motif ◽

Dwarf Phenotype ◽

Genome Wide ◽

A Genome ◽

Mutant Lines ◽

Genome Wide Gene Expression

The rice SLR1 gene encodes the DELLA protein (protein with DELLA amino acid motif), and a loss-of-function mutation is dwarfed by inhibiting plant growth. We generate slr1-d mutants with a semi-dominant dwarf phenotype to target mutations of the DELLA/TVHYNP domain using CRISPR/Cas9 genome editing in rice. Sixteen genetic edited lines out of 31 transgenic plants were generated. Deep sequencing results showed that the mutants had six different mutation types at the target site of the TVHYNP domain of the SLR1 gene. The homo-edited plants selected individuals without DNA (T-DNA) transcribed by segregation in the T1 generation. The slr1-d7 and slr1-d8 plants caused a gibberellin (GA)-insensitive dwarf phenotype with shrunken leaves and shortened internodes. A genome-wide gene expression analysis by RNA-seq indicated that the expression levels of two GA-related genes, GA20OX2 (Gibberellin oxidase) and GA3OX2, were increased in the edited mutant plants, suggesting that GA20OX2 acts as a convert of GA12 signaling. These mutant plants are required by altering GA responses, at least partially by a defect in the phytohormone signaling system process and prevented cell elongation. The new mutants, namely, the slr1-d7 and slr1-d8 lines, are valuable semi-dominant dwarf alleles with potential application value for molecule breeding using the CRISPR/Cas9 system in rice.

Download Full-text

Characterization of the radiation desiccation response regulon of the radioresistant bacterium Deinococcus radiodurans by integrative genomic analyses.

10.1101/2021.07.07.451423 ◽

2021 ◽

Author(s):

Nicolas Eugenie ◽

Yvan Zivanovic ◽

Gaelle Lelandais ◽

Genevieve Coste ◽

Claire Bouthier de la Tour ◽

...

Keyword(s):

Deinococcus Radiodurans ◽

Normal Growth ◽

Growth Conditions ◽

Rna Seq ◽

Genome Wide ◽

A Genome ◽

Genomic Analyses ◽

Number Of Genes ◽

Wide Scale

Numerous genes are overexpressed in the radioresistant bacterium Deinococcus radiodurans after exposure to radiation or prolonged desiccation. The DdrO and IrrE proteins play a major role in regulating the expression of approximately predicted twenty of these genes. The transcriptional repressor DdrO blocks the expression of these genes under normal growth conditions. After exposure to genotoxic agents, the IrrE metalloprotease cleaves DdrO and relieves gene repression. Bioinformatic analyzes showed that this mechanism seems to be conserved in several species of Deinococcus, but many questions remain as such the number of genes regulated by DdrO. Here, by RNA-seq and CHiP-seq assays performed at a genome-wide scale coupled with bioinformatic analyses, we show that, the DdrO regulon in D. radiodurans includes many other genes than those previously described. These results thus pave the way to better understand the radioresistance mechanisms encoded by this bacterium.

Download Full-text

Cis-regulatory elements and human evolution

10.1101/005652 ◽

2014 ◽

Author(s):

Adam Siepel ◽

Leonardo Arbiza

Keyword(s):

Transcriptional Regulation ◽

Human Evolution ◽

Large Scale ◽

Regulatory Elements ◽

Data Sets ◽

Regulatory Evolution ◽

Genome Wide ◽

A Genome ◽

Wide Scale ◽

Human Polymorphism

Modification of gene regulation has long been considered an important force in human evolution, particularly through changes to cis-regulatory elements (CREs) that function in transcriptional regulation. For decades, however, the study of cis-regulatory evolution was severely limited by the available data. New data sets describing the locations of CREs and genetic variation within and between species have now made it possible to study CRE evolution much more directly on a genome-wide scale. Here, we review recent research on the evolution of CREs in humans based on large-scale genomic data sets. We consider inferences based on primate divergence,human polymorphism, and combinations of divergence and polymorphism. We then consider "new frontiers" in this field stemming from recent research on transcriptional regulation.

Download Full-text

Computational methods for annotation of plant regulatory non-coding RNAs using RNA-seq

Briefings in Bioinformatics ◽

10.1093/bib/bbaa322 ◽

2020 ◽

Author(s):

A T Vivek ◽

Shailesh Kumar

Keyword(s):

Computational Methods ◽

High Throughput ◽

Rna Seq ◽

Physiological Mechanisms ◽

Software Packages ◽

Genome Wide ◽

A Genome ◽

Wide Scale ◽

Plant Transcriptome ◽

Non Coding Rnas

Abstract Plant transcriptome encompasses numerous endogenous, regulatory non-coding RNAs (ncRNAs) that play a major biological role in regulating key physiological mechanisms. While studies have shown that ncRNAs are extremely diverse and ubiquitous, the functions of the vast majority of ncRNAs are still unknown. With ever-increasing ncRNAs under study, it is essential to identify, categorize and annotate these ncRNAs on a genome-wide scale. The use of high-throughput RNA sequencing (RNA-seq) technologies provides a broader picture of the non-coding component of transcriptome, enabling the comprehensive identification and annotation of all major ncRNAs across samples. However, the detection of known and emerging class of ncRNAs from RNA-seq data demands complex computational methods owing to their unique as well as similar characteristics. Here, we discuss major plant endogenous, regulatory ncRNAs in an RNA sample followed by computational strategies applied to discover each class of ncRNAs using RNA-seq. We also provide a collection of relevant software packages and databases to present a comprehensive bioinformatics toolbox for plant ncRNA researchers. We assume that the discussions in this review will provide a rationale for the discovery of all major categories of plant ncRNAs.

Download Full-text

RNAdetector: a free user-friendly stand-alone and cloud-based system for RNA-Seq data analysis

BMC Bioinformatics ◽

10.1186/s12859-021-04211-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Alessandro La Ferlita ◽

Salvatore Alaimo ◽

Sebastiano Di Bella ◽

Emanuele Martorana ◽

Georgios I. Laliotis ◽

...

Keyword(s):

Data Analysis ◽

Transcriptome Profiling ◽

Biological Species ◽

Third Party ◽

Rna Seq ◽

Rna Molecules ◽

Non Coding Rna ◽

Cross Platform ◽

Cloud Environments ◽

User Friendly

Abstract Background RNA-Seq is a well-established technology extensively used for transcriptome profiling, allowing the analysis of coding and non-coding RNA molecules. However, this technology produces a vast amount of data requiring sophisticated computational approaches for their analysis than other traditional technologies such as Real-Time PCR or microarrays, strongly discouraging non-expert users. For this reason, dozens of pipelines have been deployed for the analysis of RNA-Seq data. Although interesting, these present several limitations and their usage require a technical background, which may be uncommon in small research laboratories. Therefore, the application of these technologies in such contexts is still limited and causes a clear bottleneck in knowledge advancement. Results Motivated by these considerations, we have developed RNAdetector, a new free cross-platform and user-friendly RNA-Seq data analysis software that can be used locally or in cloud environments through an easy-to-use Graphical User Interface allowing the analysis of coding and non-coding RNAs from RNA-Seq datasets of any sequenced biological species. Conclusions RNAdetector is a new software that fills an essential gap between the needs of biomedical and research labs to process RNA-Seq data and their common lack of technical background in performing such analysis, which usually relies on outsourcing such steps to third party bioinformatics facilities or using expensive commercial software.

Download Full-text

Kmerator Suite: design of specific k-mer signatures and automatic metadata discovery in large RNA-Seq datasets.

10.1101/2021.05.20.444982 ◽

2021 ◽

Author(s):

Sebastien Riquier ◽

Chloe Bessiere ◽

Benoit Guibert ◽

Anne-Laure Bouge ◽

Anthony Boureux ◽

...

Keyword(s):

Gene Expression ◽

Large Datasets ◽

Rna Seq ◽

Transcript Quantification ◽

Human Genes ◽

Novel Transcripts ◽

New Biomarkers ◽

Computational Resources ◽

User Friendly ◽

Health Applications

The huge body of publicly available RNA-seq libraries is a treasure of functional information allowing to quantify the expression of known or novel transcripts in tissues. However, transcript quantification commonly relies on alignment methods requiring a lot of computational resources and processing time, which does not scale easily to large datasets. K-mer decomposition constitutes a new way to process RNA-seq data for the identification of transcriptional signatures, as k-mers can be used to quantify accurately gene expression in a less resource-consuming way. We present the Kmerator Suite, a set of three tools designed to extract specific k-mer signatures, quantify these k-mers into RNA-seq datasets and quickly visualize large datasets characteristics. The core tool, Kmerator, produces specific k-mers for 97% of human genes, enabling the measure of gene expression with high accuracy in simulated datasets. KmerExploR, a direct application of Kmerator, uses a set of predictor genes specific k-mers to infer metadata including library protocol, sample features or contaminations from RNA-seq datasets. KmerExploR results are visualised through a user-friendly interface. Moreover, we demonstrate that the Kmerator Suite can be used for advanced queries targeting known or new biomarkers such as mutations, gene fusions or long non coding-RNAs for human health applications.

Download Full-text

BingleSeq: A user-friendly R package for Bulk and Single-cell RNA-Seq Data Analysis

10.1101/2020.06.16.148239 ◽

2020 ◽

Author(s):

Daniel Dimitrov ◽

Quan Gu

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

High Throughput Sequencing ◽

Gene Annotation ◽

Differential Expression Analysis ◽

Transcriptome Profiling ◽

R Package ◽

Rna Seq ◽

The Individual ◽

User Friendly

AbstractRNA sequencing is a high-throughput sequencing technique considered as an indispensable research tool used in a broad range of transcriptome analysis studies. The most common application of RNA Sequencing is Differential Expression analysis and it is used to determine genetic loci with distinct expression across different conditions. On the other hand, an emerging field called single-cell RNA sequencing is used for transcriptome profiling at the individual cell level. The standard protocols for both these types of analyses include the processing of sequencing libraries and result in the generation of count matrices. An obstacle to these analyses and the acquisition of meaningful results is that both require programming expertise.BingleSeq was developed as an intuitive application that provides a user-friendly solution for the analysis of count matrices produced by both Bulk and Single-cell RNA-Seq experiments. This was achieved by building an interactive dashboard-like user interface and incorporating three state-of-the-art software packages for each type of the aforementioned analyses, alongside additional features such as key visualisation techniques, functional gene annotation analysis and rank-based consensus for differential gene analysis results, among others. As a result, BingleSeq puts the best and most widely used packages and tools for RNA-Seq analyses at the fingertips of biologists with no programming experience.

Download Full-text

Stormbow: A Cloud-Based Tool for Reads Mapping and Expression Quantification in Large-Scale RNA-Seq Studies

ISRN Bioinformatics ◽

10.1155/2013/481545 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 9

Author(s):

Shanrong Zhao ◽

Kurt Prenger ◽

Lance Smith

Keyword(s):

Data Analysis ◽

Large Scale ◽

Scale Up ◽

Local Environment ◽

Transcriptome Profiling ◽

Cost Effective ◽

Rna Seq ◽

Practical Challenge ◽

Amazon Web Services ◽

Computational Resources

RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets.

Download Full-text

A benchmarking of pipelines for detecting ncRNAs from RNA-Seq data

Briefings in Bioinformatics ◽

10.1093/bib/bbz110 ◽

2019 ◽

Vol 21 (6) ◽

pp. 1987-1998 ◽

Cited By ~ 1

Author(s):

Sebastiano Di Bella ◽

Alessandro La Ferlita ◽

Giovanni Carapezza ◽

Salvatore Alaimo ◽

Antonella Isacchi ◽

...

Keyword(s):

Transcriptome Profiling ◽

Read Count ◽

Ease Of Use ◽

Rna Seq ◽

Huge Amount ◽

Key Points ◽

High Throughput Technology ◽

Different Populations ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Abstract Next-Generation Sequencing (NGS) is a high-throughput technology widely applied to genome sequencing and transcriptome profiling. RNA-Seq uses NGS to reveal RNA identities and quantities in a given sample. However, it produces a huge amount of raw data that need to be preprocessed with fast and effective computational methods. RNA-Seq can look at different populations of RNAs, including ncRNAs. Indeed, in the last few years, several ncRNAs pipelines have been developed for ncRNAs analysis from RNA-Seq experiments. In this paper, we analyze eight recent pipelines (iSmaRT, iSRAP, miARma-Seq, Oasis 2, SPORTS1.0, sRNAnalyzer, sRNApipe, sRNA workbench) which allows the analysis not only of single specific classes of ncRNAs but also of more than one ncRNA classes. Our systematic performance evaluation aims at guiding users to select the appropriate pipeline for processing each ncRNA class, focusing on three key points: (i) accuracy in ncRNAs identification, (ii) accuracy in read count estimation and (iii) deployment and ease of use.

Download Full-text