scholarly journals MetaMap, an interactive webtool for the exploration of metatranscriptomic reads in human disease-related RNA-seq data

2018 ◽  
Author(s):  
LM Simon ◽  
G Tsitsiridis ◽  
P Angerer ◽  
FJ Theis

AbstractMotivationThe MetaMap resource contains metatranscriptomic expression data from screening >17,000 RNA-seq samples from >400 archived human disease-related studies for viral and microbial reads, so-called “metafeatures”. However, navigating this set of large and heterogeneous data is challenging, especially for researchers without bioinformatic expertise. Therefore, a user-friendly interface is needed that allows users to visualize and statistically analyse the data.ResultsWe developed an interactive frontend to facilitate the exploration of the MetaMap resource. The webtool allows users to query the resource by searching study abstracts for keywords or browsing expression patterns for specific metafeatures. Moreover, users can manually define sample groupings or use the existing annotation for downstream analysis. The web tool provides a large variety of analyses and visualizations including dimension reduction, differential abundance analysis and Krona visualizations. The MetaMap webtool represents a valuable resource for hypothesis generation regarding the impact of the microbiome in human disease.AvailabilityThe presented web tool can be accessed at https://github.com/theislab/MetaMap

2017 ◽  
Vol 2017 ◽  
pp. 1-7 ◽  
Author(s):  
Chao-Yu Pan ◽  
Wei-Ting Kuo ◽  
Chien-Yuan Chiu ◽  
Wen-chang Lin

MicroRNAs (miRNAs) play important roles in human cancers. In previous studies, we have demonstrated that both 5p-arm and 3p-arm of mature miRNAs could be expressed from the same precursor and we further interrogated the 5p-arm and 3p-arm miRNA expression with a comprehensive arm feature annotation list. To assist biologists to visualize the differential 5p-arm and 3p-arm miRNA expression patterns, we utilized a user-friendly mobile App to display. The Cancer Genome Atlas (TCGA) miRNA-Seq expression information. We have collected over 4,500 miRNA-Seq datasets from 15 TCGA cancer types and further processed them with the 5p-arm and 3p-arm annotation analysis pipeline. In order to be displayed with the RNA-Seq Viewer App, annotated 5p-arm and 3p-arm miRNA expression information and miRNA gene loci information were converted into SQLite tables. In this distinct application, for any given miRNA gene, 5p-arm miRNA is illustrated on the top of chromosome ideogram and 3p-arm miRNA is illustrated on the bottom of chromosome ideogram. Users can then easily interrogate the differentially 5p-arm/3p-arm expressed miRNAs with their mobile devices. This study demonstrates the feasibility and utility of RNA-Seq Viewer App in addition to mRNA-Seq data visualization.


Genes ◽  
2020 ◽  
Vol 11 (4) ◽  
pp. 380 ◽  
Author(s):  
Zhaoxu Gao ◽  
Biying Dong ◽  
Hongyan Cao ◽  
Hang He ◽  
Qing Yang ◽  
...  

Pigeonpea is an important economic crop in the world and is mainly distributed in tropical and subtropical regions. In order to further expand the scope of planting, one of the problems that must be solved is the impact of soil acidity on plants in these areas. Based on our previous work, we constructed a time series RNA sequencing (RNA-seq) analysis under aluminum (Al) stress in pigeonpea. Through a comparison analysis, 11,425 genes were found to be differentially expressed among all the time points. After clustering these genes by their expression patterns, 12 clusters were generated. Many important functional pathways were identified by gene ontology (GO) analysis, such as biological regulation, localization, response to stimulus, metabolic process, detoxification, and so on. Further analysis showed that metabolic pathways played an important role in the response of Al stress. Thirteen out of the 23 selected genes related to flavonoids and phenols were downregulated in response to Al stress. In addition, we verified these key genes of flavonoid- and phenol-related metabolism pathways by qRT-PCR. Collectively, our findings not only revealed the regulation mechanism of pigeonpea under Al stress but also provided methodological support for further exploration of plant stress regulation mechanisms.


2017 ◽  
Vol 34 (2) ◽  
pp. 300-302 ◽  
Author(s):  
Christopher J Green ◽  
Matthew R Gazzara ◽  
Yoseph Barash

Abstract Summary Analysis of RNA sequencing (RNA-Seq) data have highlighted the fact that most genes undergo alternative splicing (AS) and that these patterns are tightly regulated. Many of these events are complex, resulting in numerous possible isoforms that quickly become difficult to visualize, interpret and experimentally validate. To address these challenges we developed MAJIQ-SPEL, a web-tool that takes as input local splicing variations (LSVs) quantified from RNA-Seq data and provides users with visualization and quantification of gene isoforms associated with those. Importantly, MAJIQ-SPEL is able to handle both classical (binary) and complex, non-binary, splicing variations. Using a matching primer design algorithm it also suggests to users possible primers for experimental validation by RT-PCR and displays those, along with the matching protein domains affected by the LSV, on UCSC Genome Browser for further downstream analysis. Availability and implementation Program and code will be available athttp://majiq.biociphers.org/majiq-spel. Supplementary information Supplementary data are available atBioinformatics online.


2017 ◽  
Author(s):  
Christopher J. Green ◽  
Matthew R. Gazzara ◽  
Yoseph Barash

AbstractAnalysis of RNA sequencing (RNA-Seq) data have highlighted the fact that most genes undergo alternative splicing (AS) and that these patterns are tightly regulated. Many of these events are complex, resulting in numerous possible isoforms that quickly become difficult to visualize, interpret, and experimentally validate. To address these challenges, We developed MAJIQ-SPEL, a web-tool that takes as input local splicing variations (LSVs) quantified from RNA-Seq data and provides users with visualization and quantification of gene isoforms associated with those. Importantly, MAJIQ-SPEL is able to handle both classical (binary) and complex (non-binary) splicing variations. Using a matching primer design algorithm it also suggests users possible primers for experimental validation by RT-PCR and displays those, along with the matching protein domains affected by the LSV, on UCSC Genome Browser for further downstream analysis.Availability: Program and code will be available at http://majiq.biociphers.org/majiq-spel


2018 ◽  
Author(s):  
LM Simon ◽  
S Karg ◽  
AJ Westermann ◽  
M Engel ◽  
AHA Elbehery ◽  
...  

AbstractBackgroundWith the advent of the age of big data in bioinformatics, large volumes of data and high performance computing power enable researchers to perform re-analyses of publicly available datasets at an unprecedented scale. Ever more studies imply the microbiome in both normal human physiology and a wide range of diseases. RNA sequencing technology (RNA-seq) is commonly used to infer global eukaryotic gene expression patterns under defined conditions, including human disease-related contexts, but its generic nature also enables the detection of microbial and viral transcripts.FindingsWe developed a bioinformatic pipeline to screen existing human RNA-seq datasets for the presence of microbial and viral reads by re-inspecting the non-human-mapping read fraction. We validated this approach by recapitulating outcomes from 6 independent controlled infection experiments of cell line models and comparison with an alternative metatranscriptomic mapping strategy. We then applied the pipeline to close to 150 terabytes of publicly available raw RNA-seq data from >17,000 samples from >400 studies relevant to human disease using state-of-the-art high performance computing systems. The resulting data of this large-scale re-analysis are made available in the presented MetaMap resource.ConclusionsOur results demonstrate that common human RNA-seq data, including those archived in public repositories, might contain valuable information to correlate microbial and viral detection patterns with diverse diseases. The presented MetaMap database thus provides a rich resource for hypothesis generation towards the role of the microbiome in human disease.


2016 ◽  
Author(s):  
Karl P. Gerhardt ◽  
Evan J. Olson ◽  
Sebastian M. Castillo-Hair ◽  
Lucas A. Hartsough ◽  
Brian P. Landry ◽  
...  

AbstractIn optogenetics, researchers use light and genetically encoded photoreceptors to control biological processes with unmatched precision. However, outside of neuroscience, the impact of optogenetics has been limited by a lack of user-friendly, flexible, accessible hardware. Here, we engineer the Light Plate Apparatus (LPA), a device that can deliver two independent 310 to 1550 nm light signals to each well of a 24-well plate with intensity control over three orders of magnitude and millisecond resolution. Signals are programmed using an intuitive web tool named Iris. All components can be purchased for under $400 and the device can be assembled and calibrated by a non-expert in one day. We use the LPA to precisely control gene expression from blue, green, and red light responsive optogenetic tools in bacteria, yeast, and mammalian cells and simplify the entrainment of cyanobacterial circadian rhythm. The LPA dramatically reduces the entry barrier to optogenetics and photobiology experiments.


2021 ◽  
Author(s):  
Ping-Han Hsieh ◽  
Camila Miranda Lopes-Ramos ◽  
Geir Kjetil Sandve ◽  
Kimberly Glass ◽  
Marieke Lydia Kuijjer

Gene co-expression measurements are widely used in computational biology to identify coordinated expression patterns across a group of samples, which may indicate that these genes are controlled by the same transcriptional regulatory program, or involved in common biological processes. Gene co-expression is generally estimated from RNA-Seq data, which are generally normalized to remove technical variability. Here, we find and demonstrate that certain normalization methods, in particular quantile-based methods, can introduce false-positive associations between genes, and that this can consequently hamper downstream co-expression network analysis. Quantile-based normalization can, however, be extremely powerful. In particular when preprocessing large-scale heterogeneous data, quantile-based normalization can be applied to remove technical variability while maintaining global differences in expression for samples with different biological attributes. We therefore developed CAIMAN, a method to correct for false-positive associations that may arise from normalization of RNA-Seq data. CAIMAN utilizes a Gaussian mixture model to fit the distribution of gene expression and to adaptively select the threshold to define lowly expressed genes, which are prone to form false-positive associations. Thereafter, CAIMAN corrects the normalized expression for these genes by removing the variability across samples that might lead to false-positive associations. Moreover, CAIMAN avoids arbitrary gene filtering and retains associations to genes that only express in small subgroups of samples, highlighting its potential future impact on network modeling and other association-based approaches in large-scale heterogeneous data.


2020 ◽  
Vol 49 (D1) ◽  
pp. D1489-D1495 ◽  
Author(s):  
Jingjing Jin ◽  
Peng Lu ◽  
Yalong Xu ◽  
Zefeng Li ◽  
Shizhou Yu ◽  
...  

Abstract Long noncoding RNAs (lncRNAs) are transcripts longer than 200 nucleotides with little or no protein coding potential. The expanding list of lncRNAs and accumulating evidence of their functions in plants have necessitated the creation of a comprehensive database for lncRNA research. However, currently available plant lncRNA databases have some deficiencies, including the lack of lncRNA data from some model plants, uneven annotation standards, a lack of visualization for expression patterns, and the absence of epigenetic information. To overcome these problems, we upgraded our Plant Long noncoding RNA Database (PLncDB, http://plncdb.tobaccodb.org/), which was based on a uniform annotation pipeline. PLncDB V2.0 currently contains 1 246 372 lncRNAs for 80 plant species based on 13 834 RNA-Seq datasets, integrating lncRNA information from four other resources including EVLncRNAs, RNAcentral and etc. Expression patterns and epigenetic signals can be visualized using multiple tools (JBrowse, eFP Browser and EPexplorer). Targets and regulatory networks for lncRNAs are also provided for function exploration. In addition, PLncDB V2.0 is hierarchical and user-friendly and has five built-in search engines. We believe PLncDB V2.0 is useful for the plant lncRNA community and data mining studies and provides a comprehensive resource for data-driven lncRNA research in plants.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Nicholas J. Eagles ◽  
Emily E. Burke ◽  
Jacob Leonard ◽  
Brianna K. Barry ◽  
Joshua M. Stolz ◽  
...  

Abstract Background RNA sequencing (RNA-seq) is a common and widespread biological assay, and an increasing amount of data is generated with it. In practice, there are a large number of individual steps a researcher must perform before raw RNA-seq reads yield directly valuable information, such as differential gene expression data. Existing software tools are typically specialized, only performing one step–such as alignment of reads to a reference genome–of a larger workflow. The demand for a more comprehensive and reproducible workflow has led to the production of a number of publicly available RNA-seq pipelines. However, we have found that most require computational expertise to set up or share among several users, are not actively maintained, or lack features we have found to be important in our own analyses. Results In response to these concerns, we have developed a Scalable Pipeline for Expression Analysis and Quantification (SPEAQeasy), which is easy to install and share, and provides a bridge towards R/Bioconductor downstream analysis solutions. SPEAQeasy is portable across computational frameworks (SGE, SLURM, local, docker integration) and different configuration files are provided (http://research.libd.org/SPEAQeasy/). Conclusions SPEAQeasy is user-friendly and lowers the computational-domain entry barrier for biologists and clinicians to RNA-seq data processing as the main input file is a table with sample names and their corresponding FASTQ files. The goal is to provide a flexible pipeline that is immediately usable by researchers, regardless of their technical background or computing environment.


2020 ◽  
Author(s):  
Boas Pucker ◽  
Anna Schwandner ◽  
Sarah Becker ◽  
Ludger Hausmann ◽  
Prisca Viehöver ◽  
...  

AbstractPlants display sophisticated mechanisms to tolerate challenging environmental conditions and need to manage their ontogenesis in parallel. Here, we set out to generate an RNA-Seq time series dataset throughout grapevine (Vitis vinifera) early bud development. The expression of the developmental regulator VviAP1 served as an indicator for progress of development. We investigated the impact of changing temperatures on gene expression levels during the time series and detected a correlation between increased temperatures and a high expression level of genes encoding heat-shock proteins. The data set also allowed the exemplary investigation of expression patterns of genes from three transcription factor (TF) gene families, namely MADS-box, WRKY, and R2R3-MYB genes. Inspection of the expression profiles from all three TF gene families indicated that a switch in the developmental program takes place in July which coincides with increased expression of the bud dormancy marker gene VviDRM1.


Sign in / Sign up

Export Citation Format

Share Document