scholarly journals A Bayesian Framework for Detecting Gene Expression Outliers in Individual Samples

2019 ◽  
Author(s):  
John Vivian ◽  
Jordan Eizenga ◽  
Holly C. Beale ◽  
Olena Morozova-Vaske ◽  
Benedict Paten

ABSTRACTObjectiveMany antineoplastics are designed to target upregulated genes, but quantifying upregulation in a single patient sample requires an appropriate set of samples for comparison. In cancer, the most natural comparison set is unaffected samples from the matching tissue, but there are often too few available unaffected samples to overcome high inter-sample variance. Moreover, some cancer samples have misidentified tissues or origin, or even composite-tissue phenotypes. Even if an appropriate comparison set can be identified, most differential expression tools are not designed to accommodate comparing to a single patient sample.Materials and MethodsWe propose a Bayesian statistical framework for gene expression outlier detection in single samples. Our method uses all available data to produce a consensus background distribution for each gene of interest without requiring the researcher to manually select a comparison set. The consensus distribution can then be used to quantify over- and under-expression.ResultsWe demonstrate this method on both simulated and real gene expression data. We show that it can robustly quantify overexpression, even when the set of comparison samples lacks ideally matched tissues samples. Further, our results show that the method can identify appropriate comparison sets from samples of mixed lineage and rediscover numerous known gene-cancer expression patterns.ConclusionsThis exploratory method is suitable for identifying expression outliers from comparative RNA-seq analysis for individual samples and Treehouse, a pediatric precision medicine group that leverages RNA-seq to identify potential therapeutic leads for patients, plans to explore this method for processing their pediatric cohort.

2020 ◽  
pp. 160-170
Author(s):  
John Vivian ◽  
Jordan M. Eizenga ◽  
Holly C. Beale ◽  
Olena M. Vaske ◽  
Benedict Paten

PURPOSE Many antineoplastics are designed to target upregulated genes, but quantifying upregulation in a single patient sample requires an appropriate set of samples for comparison. In cancer, the most natural comparison set is unaffected samples from the matching tissue, but there are often too few available unaffected samples to overcome high intersample variance. Moreover, some cancer samples have misidentified tissues of origin or even composite-tissue phenotypes. Even if an appropriate comparison set can be identified, most differential expression tools are not designed to accommodate comparisons to a single patient sample. METHODS We propose a Bayesian statistical framework for gene expression outlier detection in single samples. Our method uses all available data to produce a consensus background distribution for each gene of interest without requiring the researcher to manually select a comparison set. The consensus distribution can then be used to quantify over- and underexpression. RESULTS We demonstrate this method on both simulated and real gene expression data. We show that it can robustly quantify overexpression, even when the set of comparison samples lacks ideally matched tissue samples. Furthermore, our results show that the method can identify appropriate comparison sets from samples of mixed lineage and rediscover numerous known gene-cancer expression patterns. CONCLUSION This exploratory method is suitable for identifying expression outliers from comparative RNA sequencing (RNA-seq) analysis for individual samples, and Treehouse, a pediatric precision medicine group that leverages RNA-seq to identify potential therapeutic leads for patients, plans to explore this method for processing its pediatric cohort.


Foods ◽  
2021 ◽  
Vol 10 (2) ◽  
pp. 360
Author(s):  
Guodong Rao ◽  
Jianguo Zhang ◽  
Xiaoxia Liu ◽  
Xue Li ◽  
Chenhe Wang

Olive oil has been favored as high-quality edible oil because it contains balanced fatty acids (FAs) and high levels of minor components. The contents of FAs and minor components are variable in olive fruits of different color at harvest time, which render it difficult to determine the optimal harvest strategy for olive oil producing. Here, we combined metabolome, Pacbio Iso-seq, and Illumina RNA-seq transcriptome to investigate the association between metabolites and gene expression of olive fruits at harvest time. A total of 34 FAs, 12 minor components, and 181 other metabolites (including organic acids, polyols, amino acids, and sugars) were identified in this study. Moreover, we proposed optimal olive harvesting strategy models based on different production purposes. In addition, we used the combined Pacbio Iso-seq and Illumina RNA-seq gene expression data to identify genes related to the biosynthetic pathways of hydroxytyrosol and oleuropein. These data lay the foundation for future investigations of olive fruit metabolism and gene expression patterns, and provide a method to obtain olive harvesting strategies for different production purposes.


2021 ◽  
Author(s):  
Taylor Reiter ◽  
Rachel Montpetit ◽  
Ron Runnebaum ◽  
C. Titus Brown ◽  
Ben Montpetit

AbstractGrapes grown in a particular geographic region often produce wines with consistent characteristics, suggesting there are site-specific factors driving recurrent fermentation outcomes. However, our understanding of the relationship between site-specific factors, microbial metabolism, and wine fermentation outcomes are not well understood. Here, we used differences in Saccharomyces cerevisiae gene expression as a biosensor for differences among Pinot noir fermentations from 15 vineyard sites. We profiled time series gene expression patterns of primary fermentations, but fermentations proceeded at different rates, making analyzes of these data with conventional differential expression tools difficult. This led us to develop a novel approach that combines diffusion mapping with continuous differential expression analysis. Using this method, we identified vineyard specific deviations in gene expression, including changes in gene expression correlated with the activity of the non-Saccharomyces yeast Hanseniaspora uvarum, as well as with initial nitrogen concentrations in grape musts. These results highlight novel relationships between site-specific variables and Saccharomyces cerevisiae gene expression that are linked to repeated wine fermentation outcomes. In addition, we demonstrate that our analysis approach can extract biologically relevant gene expression patterns in other contexts (e.g., hypoxic response of Saccharomyces cerevisiae), indicating that this approach offers a general method for investigating asynchronous time series gene expression data.ImportanceWhile it is generally accepted that foods, in particular wine, possess sensory characteristics associated with or derived from their place of origin, we lack knowledge of the biotic and abiotic factors central to this phenomenon. We have used Saccharomyces cerevisiae gene expression as a biosensor to capture differences in fermentations of Pinot noir grapes from 15 vineyards across two vintages. We find that gene expression by non-Saccharomyces yeasts and initial nitrogen content in the grape must correlates with differences in gene expression among fermentations from these vintages. These findings highlight important relationships between site-specific variables and gene expression that can be used to understand, or possibly modify, wine fermentation outcomes. Our work also provides a novel analysis method for investigating asynchronous gene expression data sets that is able to reveal both global shifts and subtle differences in gene expression due to varied cell – environment interactions.


2020 ◽  
Author(s):  
Minsheng Hao ◽  
Kui Hua ◽  
Xuegong Zhang

AbstractRecent developments of spatial transcriptomic sequencing technologies provide powerful tools for understanding cells in the physical context of tissue micro-environments. A fundamental task in spatial gene expression analysis is to identify genes with spatially variable expression patterns, or spatially variable genes (SVgenes). Several computational methods have been developed for this task. Their high computational complexity limited their scalability to the latest and future large-scale spatial expression data.We present SOMDE, an efficient method for identifying SVgenes in large-scale spatial expression data. SOMDE uses selforganizing map (SOM) to cluster neighboring cells into nodes, and then uses a Gaussian Process to fit the node-level spatial gene expression to identify SVgenes. Experiments show that SOMDE is about 5-50 times faster than existing methods with comparable results. The adjustable resolution of SOMDE makes it the only method that can give results in ~5 minutes in large datasets of more than 20,000 sequencing sites. SOMDE is available as a python package on PyPI at https://pypi.org/project/somde.


Author(s):  
Crescenzio Gallo

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter, the authors examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.


2019 ◽  
Vol 15 (2) ◽  
pp. e1006792 ◽  
Author(s):  
Brandon Monier ◽  
Adam McDermaid ◽  
Cankun Wang ◽  
Jing Zhao ◽  
Allison Miller ◽  
...  

2008 ◽  
Vol 5 (2) ◽  
Author(s):  
Krzysztof Borowski ◽  
Jung Soh ◽  
Christoph W. Sensen

SummaryThe need for novel methods of visualizing microarray data is growing. New perspectives are beneficial to finding patterns in expression data. The Bluejay genome browser provides an integrative way of visualizing gene expression datasets in a genomic context. We have now developed the functionality to display multiple microarray datasets simultaneously in Bluejay, in order to provide researchers with a comprehensive view of their datasets linked to a graphical representation of gene function. This will enable biologists to obtain valuable insights on expression patterns, by allowing them to analyze the expression values in relation to the gene locations as well as to compare expression profiles of related genomes or of di erent experiments for the same genome.


Author(s):  
D Fumagalli ◽  
B Haibe-Kains ◽  
S Michiels ◽  
DN Brown ◽  
D Gacquer ◽  
...  

2019 ◽  
Vol 2019 ◽  
pp. 1-12
Author(s):  
Shan Lin ◽  
Zhicheng Zou ◽  
Cuibing Zhou ◽  
Hancheng Zhang ◽  
Zhiming Cai

Caterpillar fungus is a well-known fungal Chinese medicine. To reveal molecular changes during early and late stages of adenosine biosynthesis, transcriptome analysis was performed with the anamorph strain of caterpillar fungus. A total of 2,764 differentially expressed genes (DEGs) were identified (p≤0.05, |log2 Ratio| ≥ 1), of which 1,737 were up-regulated and 1,027 were down-regulated. Gene expression profiling on 4–10 d revealed a distinct shift in expression of the purine metabolism pathway. Differential expression of 17 selected DEGs which involved in purine metabolism (map00230) were validated by qPCR, and the expression trends were consistent with the RNA-Seq results. Subsequently, the predicted adenosine biosynthesis pathway combined with qPCR and gene expression data of RNA-Seq indicated that the increased adenosine accumulation is a result of down-regulation of ndk, ADK, and APRT genes combined with up-regulation of AK gene. This study will be valuable for understanding the molecular mechanisms of the adenosine biosynthesis in caterpillar fungus.


Sign in / Sign up

Export Citation Format

Share Document