MetaMap, an interactive webtool for the exploration of metatranscriptomic reads in human disease-related RNA-seq data

Mapping Intimacies ◽

10.1101/425439 ◽

2018 ◽

Author(s):

LM Simon ◽

G Tsitsiridis ◽

P Angerer ◽

FJ Theis

Keyword(s):

Human Disease ◽

Expression Patterns ◽

Heterogeneous Data ◽

Rna Seq ◽

Web Tool ◽

Differential Abundance ◽

Downstream Analysis ◽

User Friendly ◽

Differential Abundance Analysis ◽

The Impact

AbstractMotivationThe MetaMap resource contains metatranscriptomic expression data from screening >17,000 RNA-seq samples from >400 archived human disease-related studies for viral and microbial reads, so-called “metafeatures”. However, navigating this set of large and heterogeneous data is challenging, especially for researchers without bioinformatic expertise. Therefore, a user-friendly interface is needed that allows users to visualize and statistically analyse the data.ResultsWe developed an interactive frontend to facilitate the exploration of the MetaMap resource. The webtool allows users to query the resource by searching study abstracts for keywords or browsing expression patterns for specific metafeatures. Moreover, users can manually define sample groupings or use the existing annotation for downstream analysis. The web tool provides a large variety of analyses and visualizations including dimension reduction, differential abundance analysis and Krona visualizations. The MetaMap webtool represents a valuable resource for hypothesis generation regarding the impact of the microbiome in human disease.AvailabilityThe presented web tool can be accessed at https://github.com/theislab/MetaMap

Download Full-text

Visual Display of 5p-arm and 3p-arm miRNA Expression with a Mobile Application

BioMed Research International ◽

10.1155/2017/6037168 ◽

2017 ◽

Vol 2017 ◽

pp. 1-7 ◽

Cited By ~ 3

Author(s):

Chao-Yu Pan ◽

Wei-Ting Kuo ◽

Chien-Yuan Chiu ◽

Wen-chang Lin

Keyword(s):

Mirna Expression ◽

Expression Patterns ◽

Mirna Gene ◽

Visual Display ◽

Mature Mirnas ◽

Mobile App ◽

The Cancer Genome Atlas ◽

Rna Seq ◽

Cancer Genome Atlas ◽

User Friendly

MicroRNAs (miRNAs) play important roles in human cancers. In previous studies, we have demonstrated that both 5p-arm and 3p-arm of mature miRNAs could be expressed from the same precursor and we further interrogated the 5p-arm and 3p-arm miRNA expression with a comprehensive arm feature annotation list. To assist biologists to visualize the differential 5p-arm and 3p-arm miRNA expression patterns, we utilized a user-friendly mobile App to display. The Cancer Genome Atlas (TCGA) miRNA-Seq expression information. We have collected over 4,500 miRNA-Seq datasets from 15 TCGA cancer types and further processed them with the 5p-arm and 3p-arm annotation analysis pipeline. In order to be displayed with the RNA-Seq Viewer App, annotated 5p-arm and 3p-arm miRNA expression information and miRNA gene loci information were converted into SQLite tables. In this distinct application, for any given miRNA gene, 5p-arm miRNA is illustrated on the top of chromosome ideogram and 3p-arm miRNA is illustrated on the bottom of chromosome ideogram. Users can then easily interrogate the differentially 5p-arm/3p-arm expressed miRNAs with their mobile devices. This study demonstrates the feasibility and utility of RNA-Seq Viewer App in addition to mRNA-Seq data visualization.

Download Full-text

Time Series RNA-seq in Pigeonpea Revealed the Core Genes in Metabolic Pathways under Aluminum Stress

Genes ◽

10.3390/genes11040380 ◽

2020 ◽

Vol 11 (4) ◽

pp. 380 ◽

Cited By ~ 1

Author(s):

Zhaoxu Gao ◽

Biying Dong ◽

Hongyan Cao ◽

Hang He ◽

Qing Yang ◽

...

Keyword(s):

Time Series ◽

Metabolic Pathways ◽

Expression Patterns ◽

Plant Stress ◽

Regulation Mechanism ◽

Rna Seq ◽

Aluminum Stress ◽

Biological Regulation ◽

Al Stress ◽

The Impact

Pigeonpea is an important economic crop in the world and is mainly distributed in tropical and subtropical regions. In order to further expand the scope of planting, one of the problems that must be solved is the impact of soil acidity on plants in these areas. Based on our previous work, we constructed a time series RNA sequencing (RNA-seq) analysis under aluminum (Al) stress in pigeonpea. Through a comparison analysis, 11,425 genes were found to be differentially expressed among all the time points. After clustering these genes by their expression patterns, 12 clusters were generated. Many important functional pathways were identified by gene ontology (GO) analysis, such as biological regulation, localization, response to stimulus, metabolic process, detoxification, and so on. Further analysis showed that metabolic pathways played an important role in the response of Al stress. Thirteen out of the 23 selected genes related to flavonoids and phenols were downregulated in response to Al stress. In addition, we verified these key genes of flavonoid- and phenol-related metabolism pathways by qRT-PCR. Collectively, our findings not only revealed the regulation mechanism of pigeonpea under Al stress but also provided methodological support for further exploration of plant stress regulation mechanisms.

Download Full-text

MAJIQ-SPEL: web-tool to interrogate classical and complex splicing variations from RNA-Seq data

Bioinformatics ◽

10.1093/bioinformatics/btx565 ◽

2017 ◽

Vol 34 (2) ◽

pp. 300-302 ◽

Cited By ~ 2

Author(s):

Christopher J Green ◽

Matthew R Gazzara ◽

Yoseph Barash

Keyword(s):

Experimental Validation ◽

Ucsc Genome Browser ◽

Supplementary Information ◽

Supplementary Data ◽

Rna Seq ◽

Web Tool ◽

Rt Pcr ◽

Design Algorithm ◽

Gene Isoforms ◽

Downstream Analysis

Abstract Summary Analysis of RNA sequencing (RNA-Seq) data have highlighted the fact that most genes undergo alternative splicing (AS) and that these patterns are tightly regulated. Many of these events are complex, resulting in numerous possible isoforms that quickly become difficult to visualize, interpret and experimentally validate. To address these challenges we developed MAJIQ-SPEL, a web-tool that takes as input local splicing variations (LSVs) quantified from RNA-Seq data and provides users with visualization and quantification of gene isoforms associated with those. Importantly, MAJIQ-SPEL is able to handle both classical (binary) and complex, non-binary, splicing variations. Using a matching primer design algorithm it also suggests to users possible primers for experimental validation by RT-PCR and displays those, along with the matching protein domains affected by the LSV, on UCSC Genome Browser for further downstream analysis. Availability and implementation Program and code will be available athttp://majiq.biociphers.org/majiq-spel. Supplementary information Supplementary data are available atBioinformatics online.

Download Full-text

MAJIQ-SPEL: Web-Tool to interrogate classical and complex splicing variations from RNA-Seq data

10.1101/136077 ◽

2017 ◽

Author(s):

Christopher J. Green ◽

Matthew R. Gazzara ◽

Yoseph Barash

Keyword(s):

Alternative Splicing ◽

Rna Sequencing ◽

Experimental Validation ◽

Ucsc Genome Browser ◽

Rna Seq ◽

Web Tool ◽

Rt Pcr ◽

Design Algorithm ◽

Gene Isoforms ◽

Downstream Analysis

AbstractAnalysis of RNA sequencing (RNA-Seq) data have highlighted the fact that most genes undergo alternative splicing (AS) and that these patterns are tightly regulated. Many of these events are complex, resulting in numerous possible isoforms that quickly become difficult to visualize, interpret, and experimentally validate. To address these challenges, We developed MAJIQ-SPEL, a web-tool that takes as input local splicing variations (LSVs) quantified from RNA-Seq data and provides users with visualization and quantification of gene isoforms associated with those. Importantly, MAJIQ-SPEL is able to handle both classical (binary) and complex (non-binary) splicing variations. Using a matching primer design algorithm it also suggests users possible primers for experimental validation by RT-PCR and displays those, along with the matching protein domains affected by the LSV, on UCSC Genome Browser for further downstream analysis.Availability: Program and code will be available at http://majiq.biociphers.org/majiq-spel

Download Full-text

MetaMap: An atlas of metatranscriptomic reads in human disease-related RNA-seq data

10.1101/269092 ◽

2018 ◽

Cited By ~ 1

Author(s):

LM Simon ◽

S Karg ◽

AJ Westermann ◽

M Engel ◽

AHA Elbehery ◽

...

Keyword(s):

High Performance Computing ◽

Human Disease ◽

High Performance ◽

Large Scale ◽

Expression Patterns ◽

Rna Seq ◽

Wide Range ◽

Eukaryotic Gene ◽

Public Repositories ◽

Performance Computing

AbstractBackgroundWith the advent of the age of big data in bioinformatics, large volumes of data and high performance computing power enable researchers to perform re-analyses of publicly available datasets at an unprecedented scale. Ever more studies imply the microbiome in both normal human physiology and a wide range of diseases. RNA sequencing technology (RNA-seq) is commonly used to infer global eukaryotic gene expression patterns under defined conditions, including human disease-related contexts, but its generic nature also enables the detection of microbial and viral transcripts.FindingsWe developed a bioinformatic pipeline to screen existing human RNA-seq datasets for the presence of microbial and viral reads by re-inspecting the non-human-mapping read fraction. We validated this approach by recapitulating outcomes from 6 independent controlled infection experiments of cell line models and comparison with an alternative metatranscriptomic mapping strategy. We then applied the pipeline to close to 150 terabytes of publicly available raw RNA-seq data from >17,000 samples from >400 studies relevant to human disease using state-of-the-art high performance computing systems. The resulting data of this large-scale re-analysis are made available in the presented MetaMap resource.ConclusionsOur results demonstrate that common human RNA-seq data, including those archived in public repositories, might contain valuable information to correlate microbial and viral detection patterns with diverse diseases. The presented MetaMap database thus provides a rich resource for hypothesis generation towards the role of the microbiome in human disease.

Download Full-text

An open-hardware platform for optogenetics and photobiology

10.1101/055053 ◽

2016 ◽

Author(s):

Karl P. Gerhardt ◽

Evan J. Olson ◽

Sebastian M. Castillo-Hair ◽

Lucas A. Hartsough ◽

Brian P. Landry ◽

...

Keyword(s):

Mammalian Cells ◽

Red Light ◽

Web Tool ◽

Entry Barrier ◽

Open Hardware ◽

Intensity Control ◽

Light Signals ◽

User Friendly ◽

The Impact ◽

Plate Apparatus

AbstractIn optogenetics, researchers use light and genetically encoded photoreceptors to control biological processes with unmatched precision. However, outside of neuroscience, the impact of optogenetics has been limited by a lack of user-friendly, flexible, accessible hardware. Here, we engineer the Light Plate Apparatus (LPA), a device that can deliver two independent 310 to 1550 nm light signals to each well of a 24-well plate with intensity control over three orders of magnitude and millisecond resolution. Signals are programmed using an intuitive web tool named Iris. All components can be purchased for under $400 and the device can be assembled and calibrated by a non-expert in one day. We use the LPA to precisely control gene expression from blue, green, and red light responsive optogenetic tools in bacteria, yeast, and mammalian cells and simplify the entrainment of cyanobacterial circadian rhythm. The LPA dramatically reduces the entry barrier to optogenetics and photobiology experiments.

Download Full-text

Adjustment of spurious correlations in co-expression measurements from RNA-Sequencing data

10.1101/2021.03.25.436972 ◽

2021 ◽

Author(s):

Ping-Han Hsieh ◽

Camila Miranda Lopes-Ramos ◽

Geir Kjetil Sandve ◽

Kimberly Glass ◽

Marieke Lydia Kuijjer

Keyword(s):

False Positive ◽

Large Scale ◽

Expression Patterns ◽

Gaussian Mixture ◽

Heterogeneous Data ◽

Rna Seq ◽

Sequencing Data ◽

Technical Variability ◽

Gene Filtering ◽

Coordinated Expression

Gene co-expression measurements are widely used in computational biology to identify coordinated expression patterns across a group of samples, which may indicate that these genes are controlled by the same transcriptional regulatory program, or involved in common biological processes. Gene co-expression is generally estimated from RNA-Seq data, which are generally normalized to remove technical variability. Here, we find and demonstrate that certain normalization methods, in particular quantile-based methods, can introduce false-positive associations between genes, and that this can consequently hamper downstream co-expression network analysis. Quantile-based normalization can, however, be extremely powerful. In particular when preprocessing large-scale heterogeneous data, quantile-based normalization can be applied to remove technical variability while maintaining global differences in expression for samples with different biological attributes. We therefore developed CAIMAN, a method to correct for false-positive associations that may arise from normalization of RNA-Seq data. CAIMAN utilizes a Gaussian mixture model to fit the distribution of gene expression and to adaptively select the threshold to define lowly expressed genes, which are prone to form false-positive associations. Thereafter, CAIMAN corrects the normalized expression for these genes by removing the variability across samples that might lead to false-positive associations. Moreover, CAIMAN avoids arbitrary gene filtering and retains associations to genes that only express in small subgroups of samples, highlighting its potential future impact on network modeling and other association-based approaches in large-scale heterogeneous data.

Download Full-text

PLncDB V2.0: a comprehensive encyclopedia of plant long noncoding RNAs

Nucleic Acids Research ◽

10.1093/nar/gkaa910 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D1489-D1495 ◽

Cited By ~ 1

Author(s):

Jingjing Jin ◽

Peng Lu ◽

Yalong Xu ◽

Zefeng Li ◽

Shizhou Yu ◽

...

Keyword(s):

Noncoding Rna ◽

Regulatory Networks ◽

Expression Patterns ◽

Noncoding Rnas ◽

Long Noncoding Rnas ◽

Data Driven ◽

Rna Seq ◽

Protein Coding ◽

User Friendly ◽

Coding Potential

Abstract Long noncoding RNAs (lncRNAs) are transcripts longer than 200 nucleotides with little or no protein coding potential. The expanding list of lncRNAs and accumulating evidence of their functions in plants have necessitated the creation of a comprehensive database for lncRNA research. However, currently available plant lncRNA databases have some deficiencies, including the lack of lncRNA data from some model plants, uneven annotation standards, a lack of visualization for expression patterns, and the absence of epigenetic information. To overcome these problems, we upgraded our Plant Long noncoding RNA Database (PLncDB, http://plncdb.tobaccodb.org/), which was based on a uniform annotation pipeline. PLncDB V2.0 currently contains 1 246 372 lncRNAs for 80 plant species based on 13 834 RNA-Seq datasets, integrating lncRNA information from four other resources including EVLncRNAs, RNAcentral and etc. Expression patterns and epigenetic signals can be visualized using multiple tools (JBrowse, eFP Browser and EPexplorer). Targets and regulatory networks for lncRNAs are also provided for function exploration. In addition, PLncDB V2.0 is hierarchical and user-friendly and has five built-in search engines. We believe PLncDB V2.0 is useful for the plant lncRNA community and data mining studies and provides a comprehensive resource for data-driven lncRNA research in plants.

Download Full-text

SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses

BMC Bioinformatics ◽

10.1186/s12859-021-04142-3 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Nicholas J. Eagles ◽

Emily E. Burke ◽

Jacob Leonard ◽

Brianna K. Barry ◽

Joshua M. Stolz ◽

...

Keyword(s):

Expression Analysis ◽

Computational Domain ◽

Rna Seq ◽

Entry Barrier ◽

One Step ◽

Differential Gene ◽

Main Input ◽

Set Up ◽

Downstream Analysis ◽

User Friendly

Abstract Background RNA sequencing (RNA-seq) is a common and widespread biological assay, and an increasing amount of data is generated with it. In practice, there are a large number of individual steps a researcher must perform before raw RNA-seq reads yield directly valuable information, such as differential gene expression data. Existing software tools are typically specialized, only performing one step–such as alignment of reads to a reference genome–of a larger workflow. The demand for a more comprehensive and reproducible workflow has led to the production of a number of publicly available RNA-seq pipelines. However, we have found that most require computational expertise to set up or share among several users, are not actively maintained, or lack features we have found to be important in our own analyses. Results In response to these concerns, we have developed a Scalable Pipeline for Expression Analysis and Quantification (SPEAQeasy), which is easy to install and share, and provides a bridge towards R/Bioconductor downstream analysis solutions. SPEAQeasy is portable across computational frameworks (SGE, SLURM, local, docker integration) and different configuration files are provided (http://research.libd.org/SPEAQeasy/). Conclusions SPEAQeasy is user-friendly and lowers the computational-domain entry barrier for biologists and clinicians to RNA-seq data processing as the main input file is a table with sample names and their corresponding FASTQ files. The goal is to provide a flexible pipeline that is immediately usable by researchers, regardless of their technical background or computing environment.

Download Full-text

RNA-Seq Time Series of Vitis vinifera Bud Development Reveals Correlation of Expression Patterns with the Local Temperature Profile

10.1101/2020.10.18.344176 ◽

2020 ◽

Author(s):

Boas Pucker ◽

Anna Schwandner ◽

Sarah Becker ◽

Ludger Hausmann ◽

Prisca Viehöver ◽

...

Keyword(s):

Time Series ◽

Vitis Vinifera ◽

Expression Profiles ◽

Expression Patterns ◽

Gene Families ◽

Rna Seq ◽

Data Set ◽

Bud Development ◽

Tf Gene ◽

The Impact

AbstractPlants display sophisticated mechanisms to tolerate challenging environmental conditions and need to manage their ontogenesis in parallel. Here, we set out to generate an RNA-Seq time series dataset throughout grapevine (Vitis vinifera) early bud development. The expression of the developmental regulator VviAP1 served as an indicator for progress of development. We investigated the impact of changing temperatures on gene expression levels during the time series and detected a correlation between increased temperatures and a high expression level of genes encoding heat-shock proteins. The data set also allowed the exemplary investigation of expression patterns of genes from three transcription factor (TF) gene families, namely MADS-box, WRKY, and R2R3-MYB genes. Inspection of the expression profiles from all three TF gene families indicated that a switch in the developmental program takes place in July which coincides with increased expression of the bud dormancy marker gene VviDRM1.

Download Full-text