A Bayesian Framework for Detecting Gene Expression Outliers in Individual Samples

Mapping Intimacies ◽

10.1101/662338 ◽

2019 ◽

Author(s):

John Vivian ◽

Jordan Eizenga ◽

Holly C. Beale ◽

Olena Morozova-Vaske ◽

Benedict Paten

Keyword(s):

Gene Expression ◽

Expression Patterns ◽

Expression Data ◽

Rna Seq ◽

Single Patient ◽

Composite Tissue ◽

Patient Sample ◽

Statistical Framework ◽

Therapeutic Leads ◽

Upregulated Genes

ABSTRACTObjectiveMany antineoplastics are designed to target upregulated genes, but quantifying upregulation in a single patient sample requires an appropriate set of samples for comparison. In cancer, the most natural comparison set is unaffected samples from the matching tissue, but there are often too few available unaffected samples to overcome high inter-sample variance. Moreover, some cancer samples have misidentified tissues or origin, or even composite-tissue phenotypes. Even if an appropriate comparison set can be identified, most differential expression tools are not designed to accommodate comparing to a single patient sample.Materials and MethodsWe propose a Bayesian statistical framework for gene expression outlier detection in single samples. Our method uses all available data to produce a consensus background distribution for each gene of interest without requiring the researcher to manually select a comparison set. The consensus distribution can then be used to quantify over- and under-expression.ResultsWe demonstrate this method on both simulated and real gene expression data. We show that it can robustly quantify overexpression, even when the set of comparison samples lacks ideally matched tissues samples. Further, our results show that the method can identify appropriate comparison sets from samples of mixed lineage and rediscover numerous known gene-cancer expression patterns.ConclusionsThis exploratory method is suitable for identifying expression outliers from comparative RNA-seq analysis for individual samples and Treehouse, a pediatric precision medicine group that leverages RNA-seq to identify potential therapeutic leads for patients, plans to explore this method for processing their pediatric cohort.

Download Full-text

Bayesian Framework for Detecting Gene Expression Outliers in Individual Samples

JCO Clinical Cancer Informatics ◽

10.1200/cci.19.00095 ◽

2020 ◽

pp. 160-170

Author(s):

John Vivian ◽

Jordan M. Eizenga ◽

Holly C. Beale ◽

Olena M. Vaske ◽

Benedict Paten

Keyword(s):

Gene Expression ◽

Expression Patterns ◽

Rna Seq ◽

Single Patient ◽

Tissue Samples ◽

Composite Tissue ◽

Patient Sample ◽

Statistical Framework ◽

Therapeutic Leads ◽

Upregulated Genes

PURPOSE Many antineoplastics are designed to target upregulated genes, but quantifying upregulation in a single patient sample requires an appropriate set of samples for comparison. In cancer, the most natural comparison set is unaffected samples from the matching tissue, but there are often too few available unaffected samples to overcome high intersample variance. Moreover, some cancer samples have misidentified tissues of origin or even composite-tissue phenotypes. Even if an appropriate comparison set can be identified, most differential expression tools are not designed to accommodate comparisons to a single patient sample. METHODS We propose a Bayesian statistical framework for gene expression outlier detection in single samples. Our method uses all available data to produce a consensus background distribution for each gene of interest without requiring the researcher to manually select a comparison set. The consensus distribution can then be used to quantify over- and underexpression. RESULTS We demonstrate this method on both simulated and real gene expression data. We show that it can robustly quantify overexpression, even when the set of comparison samples lacks ideally matched tissue samples. Furthermore, our results show that the method can identify appropriate comparison sets from samples of mixed lineage and rediscover numerous known gene-cancer expression patterns. CONCLUSION This exploratory method is suitable for identifying expression outliers from comparative RNA sequencing (RNA-seq) analysis for individual samples, and Treehouse, a pediatric precision medicine group that leverages RNA-seq to identify potential therapeutic leads for patients, plans to explore this method for processing its pediatric cohort.

Download Full-text

Combined Metabolome and Transcriptome Profiling Reveal Optimal Harvest Strategy Model Based on Different Production Purposes in Olive

Foods ◽

10.3390/foods10020360 ◽

2021 ◽

Vol 10 (2) ◽

pp. 360

Author(s):

Guodong Rao ◽

Jianguo Zhang ◽

Xiaoxia Liu ◽

Xue Li ◽

Chenhe Wang

Keyword(s):

Gene Expression ◽

Olive Oil ◽

Expression Patterns ◽

Transcriptome Profiling ◽

Minor Components ◽

Harvest Time ◽

Rna Seq ◽

Optimal Harvest ◽

Harvest Strategy ◽

Different Color

Olive oil has been favored as high-quality edible oil because it contains balanced fatty acids (FAs) and high levels of minor components. The contents of FAs and minor components are variable in olive fruits of different color at harvest time, which render it difficult to determine the optimal harvest strategy for olive oil producing. Here, we combined metabolome, Pacbio Iso-seq, and Illumina RNA-seq transcriptome to investigate the association between metabolites and gene expression of olive fruits at harvest time. A total of 34 FAs, 12 minor components, and 181 other metabolites (including organic acids, polyols, amino acids, and sugars) were identified in this study. Moreover, we proposed optimal olive harvesting strategy models based on different production purposes. In addition, we used the combined Pacbio Iso-seq and Illumina RNA-seq gene expression data to identify genes related to the biosynthetic pathways of hydroxytyrosol and oleuropein. These data lay the foundation for future investigations of olive fruit metabolism and gene expression patterns, and provide a method to obtain olive harvesting strategies for different production purposes.

Download Full-text

Comparing RNA-Seq and microarray gene expression data in two zones of the Arabidopsis root apex relevant to spaceflight

Applications in Plant Sciences ◽

10.1002/aps3.1197 ◽

2018 ◽

Vol 6 (11) ◽

pp. e01197 ◽

Cited By ~ 3

Author(s):

Aparna Krishnamurthy ◽

Robert J. Ferl ◽

Anna-Lisa Paul

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Root Apex ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Rna Seq ◽

Microarray Gene Expression ◽

Arabidopsis Root ◽

Microarray Gene

Download Full-text

Mapping global shifts in Saccharomyces cerevisiae gene expression across asynchronous time trajectories with diffusion maps

10.1101/2021.02.11.430862 ◽

2021 ◽

Author(s):

Taylor Reiter ◽

Rachel Montpetit ◽

Ron Runnebaum ◽

C. Titus Brown ◽

Ben Montpetit

Keyword(s):

Gene Expression ◽

Saccharomyces Cerevisiae ◽

Expression Patterns ◽

Pinot Noir ◽

Wine Fermentation ◽

Gene Expression Patterns ◽

Expression Data ◽

Site Specific ◽

Time Series Gene Expression ◽

Specific Factors

AbstractGrapes grown in a particular geographic region often produce wines with consistent characteristics, suggesting there are site-specific factors driving recurrent fermentation outcomes. However, our understanding of the relationship between site-specific factors, microbial metabolism, and wine fermentation outcomes are not well understood. Here, we used differences in Saccharomyces cerevisiae gene expression as a biosensor for differences among Pinot noir fermentations from 15 vineyard sites. We profiled time series gene expression patterns of primary fermentations, but fermentations proceeded at different rates, making analyzes of these data with conventional differential expression tools difficult. This led us to develop a novel approach that combines diffusion mapping with continuous differential expression analysis. Using this method, we identified vineyard specific deviations in gene expression, including changes in gene expression correlated with the activity of the non-Saccharomyces yeast Hanseniaspora uvarum, as well as with initial nitrogen concentrations in grape musts. These results highlight novel relationships between site-specific variables and Saccharomyces cerevisiae gene expression that are linked to repeated wine fermentation outcomes. In addition, we demonstrate that our analysis approach can extract biologically relevant gene expression patterns in other contexts (e.g., hypoxic response of Saccharomyces cerevisiae), indicating that this approach offers a general method for investigating asynchronous time series gene expression data.ImportanceWhile it is generally accepted that foods, in particular wine, possess sensory characteristics associated with or derived from their place of origin, we lack knowledge of the biotic and abiotic factors central to this phenomenon. We have used Saccharomyces cerevisiae gene expression as a biosensor to capture differences in fermentations of Pinot noir grapes from 15 vineyards across two vintages. We find that gene expression by non-Saccharomyces yeasts and initial nitrogen content in the grape must correlates with differences in gene expression among fermentations from these vintages. These findings highlight important relationships between site-specific variables and gene expression that can be used to understand, or possibly modify, wine fermentation outcomes. Our work also provides a novel analysis method for investigating asynchronous gene expression data sets that is able to reveal both global shifts and subtle differences in gene expression due to varied cell – environment interactions.

Download Full-text

SOMDE: A scalable method for identifying spatially variable genes with self-organizing map

10.1101/2020.12.10.419549 ◽

2020 ◽

Author(s):

Minsheng Hao ◽

Kui Hua ◽

Xuegong Zhang

Keyword(s):

Gene Expression ◽

Large Scale ◽

Expression Patterns ◽

Self Organizing Map ◽

Expression Data ◽

Spatial Expression ◽

Variable Expression ◽

Sequencing Technologies ◽

Physical Context ◽

Variable Genes

AbstractRecent developments of spatial transcriptomic sequencing technologies provide powerful tools for understanding cells in the physical context of tissue micro-environments. A fundamental task in spatial gene expression analysis is to identify genes with spatially variable expression patterns, or spatially variable genes (SVgenes). Several computational methods have been developed for this task. Their high computational complexity limited their scalability to the latest and future large-scale spatial expression data.We present SOMDE, an efficient method for identifying SVgenes in large-scale spatial expression data. SOMDE uses selforganizing map (SOM) to cluster neighboring cells into nodes, and then uses a Gaussian Process to fit the node-level spatial gene expression to identify SVgenes. Experiments show that SOMDE is about 5-50 times faster than existing methods with comparable results. The adjustable resolution of SOMDE makes it the only method that can give results in ~5 minutes in large datasets of more than 20,000 sequencing sites. SOMDE is available as a python package on PyPI at https://pypi.org/project/somde.

Download Full-text

Building Gene Networks by Analyzing Gene Expression Profiles

Advanced Methodologies and Technologies in Medicine and Healthcare - Advances in Medical Diagnosis, Treatment, and Care ◽

10.4018/978-1-5225-7489-7.ch003 ◽

2019 ◽

pp. 27-44

Author(s):

Crescenzio Gallo

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Networks ◽

Dna Microarrays ◽

Expression Profiles ◽

Expression Patterns ◽

Gene Expression Profiles ◽

Expression Data ◽

Gene Expressions ◽

Over Time

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter, the authors examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.

Download Full-text

IRIS-EDA: An integrated RNA-Seq interpretation system for gene expression data analysis

PLoS Computational Biology ◽

10.1371/journal.pcbi.1006792 ◽

2019 ◽

Vol 15 (2) ◽

pp. e1006792 ◽

Cited By ~ 11

Author(s):

Brandon Monier ◽

Adam McDermaid ◽

Cankun Wang ◽

Jing Zhao ◽

Allison Miller ◽

...

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Gene Expression Data ◽

Expression Data ◽

Rna Seq ◽

Gene Expression Data Analysis ◽

Interpretation System

Download Full-text

Visual Comparison of Multiple Gene Expression Datasets in a Genomic Context

Journal of Integrative Bioinformatics ◽

10.1515/jib-2008-97 ◽

2008 ◽

Vol 5 (2) ◽

Author(s):

Krzysztof Borowski ◽

Jung Soh ◽

Christoph W. Sensen

Keyword(s):

Gene Expression ◽

Microarray Data ◽

Gene Function ◽

Graphical Representation ◽

Expression Profiles ◽

Expression Patterns ◽

Expression Data ◽

Genomic Context ◽

Multiple Gene ◽

Microarray Datasets

SummaryThe need for novel methods of visualizing microarray data is growing. New perspectives are beneficial to finding patterns in expression data. The Bluejay genome browser provides an integrative way of visualizing gene expression datasets in a genomic context. We have now developed the functionality to display multiple microarray datasets simultaneously in Bluejay, in order to provide researchers with a comprehensive view of their datasets linked to a graphical representation of gene function. This will enable biologists to obtain valuable insights on expression patterns, by allowing them to analyze the expression values in relation to the gene locations as well as to compare expression profiles of related genomes or of di erent experiments for the same genome.

Download Full-text

Abstract P3-04-10: Comparison between RNA-Seq and Affymetrix gene expression data

10.1158/0008-5472.sabcs12-p3-04-10 ◽

2012 ◽

Cited By ~ 1

Author(s):

D Fumagalli ◽

B Haibe-Kains ◽

S Michiels ◽

DN Brown ◽

D Gacquer ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Data ◽

Rna Seq ◽

Affymetrix Gene Expression

Download Full-text

Transcriptome Analysis Reveals the Molecular Mechanisms Underlying Adenosine Biosynthesis in Anamorph Strain of Caterpillar Fungus

BioMed Research International ◽

10.1155/2019/1864168 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12

Author(s):

Shan Lin ◽

Zhicheng Zou ◽

Cuibing Zhou ◽

Hancheng Zhang ◽

Zhiming Cai

Keyword(s):

Gene Expression ◽

Transcriptome Analysis ◽

Molecular Mechanisms ◽

Purine Metabolism ◽

Expression Data ◽

Rna Seq ◽

Metabolism Pathway ◽

Caterpillar Fungus ◽

Regulated Gene Expression ◽

Late Stages

Caterpillar fungus is a well-known fungal Chinese medicine. To reveal molecular changes during early and late stages of adenosine biosynthesis, transcriptome analysis was performed with the anamorph strain of caterpillar fungus. A total of 2,764 differentially expressed genes (DEGs) were identified (p≤0.05, |log2 Ratio| ≥ 1), of which 1,737 were up-regulated and 1,027 were down-regulated. Gene expression profiling on 4–10 d revealed a distinct shift in expression of the purine metabolism pathway. Differential expression of 17 selected DEGs which involved in purine metabolism (map00230) were validated by qPCR, and the expression trends were consistent with the RNA-Seq results. Subsequently, the predicted adenosine biosynthesis pathway combined with qPCR and gene expression data of RNA-Seq indicated that the increased adenosine accumulation is a result of down-regulation of ndk, ADK, and APRT genes combined with up-regulation of AK gene. This study will be valuable for understanding the molecular mechanisms of the adenosine biosynthesis in caterpillar fungus.

Download Full-text