scholarly journals GESearch: An Interactive GUI Tool for Identifying Gene Expression Signature

2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
Ning Ye ◽  
Hengfu Yin ◽  
Jingjing Liu ◽  
Xiaogang Dai ◽  
Tongming Yin

The huge amount of gene expression data generated by microarray and next-generation sequencing technologies present challenges to exploit their biological meanings. When searching for the coexpression genes, the data mining process is largely affected by selection of algorithms. Thus, it is highly desirable to provide multiple options of algorithms in the user-friendly analytical toolkit to explore the gene expression signatures. For this purpose, we developed GESearch, an interactive graphical user interface (GUI) toolkit, which is written in MATLAB and supports a variety of gene expression data files. This analytical toolkit provides four models, including the mean, the regression, the delegate, and the ensemble models, to identify the coexpression genes, and enables the users to filter data and to select gene expression patterns by browsing the display window or by importing knowledge-based genes. Subsequently, the utility of this analytical toolkit is demonstrated by analyzing two sets of real-life microarray datasets from cell-cycle experiments. Overall, we have developed an interactive GUI toolkit that allows for choosing multiple algorithms for analyzing the gene expression signatures.

2017 ◽  
Vol 20 (2) ◽  
Author(s):  
Jorge Parraga-Alava ◽  
Mario Inostroza-Ponta

Clustering algorithms are a common method for data analysis in many science field. They have become popular among biologists because of ease to discovery similar cellular functions in gene expression data. Most approaches consider the gene clustering as an optimization problem, where an ad-hoc cluster quality index is optimized which can be defined regarding gene expression data or biological information. However, these approaches may not be sufficient since they cannot guarantee to generate clusters with similar expression patterns and biological coherence. In this paper, we propose a bi-objective clustering algorithm to discover clusters of genes with high levels of co-expression and biological coherence. Our approach uses a multi-objective evolutionary algorithm (MOEA) that optimizes two index based on gene expression level and biological functional classes. The algorithm is tested on three real-life gene expression datasets. Results show that the proposed model yields gene clusters with higher levels of co-expression and biological coherence than traditional approaches.


2020 ◽  
Author(s):  
Minsheng Hao ◽  
Kui Hua ◽  
Xuegong Zhang

AbstractRecent developments of spatial transcriptomic sequencing technologies provide powerful tools for understanding cells in the physical context of tissue micro-environments. A fundamental task in spatial gene expression analysis is to identify genes with spatially variable expression patterns, or spatially variable genes (SVgenes). Several computational methods have been developed for this task. Their high computational complexity limited their scalability to the latest and future large-scale spatial expression data.We present SOMDE, an efficient method for identifying SVgenes in large-scale spatial expression data. SOMDE uses selforganizing map (SOM) to cluster neighboring cells into nodes, and then uses a Gaussian Process to fit the node-level spatial gene expression to identify SVgenes. Experiments show that SOMDE is about 5-50 times faster than existing methods with comparable results. The adjustable resolution of SOMDE makes it the only method that can give results in ~5 minutes in large datasets of more than 20,000 sequencing sites. SOMDE is available as a python package on PyPI at https://pypi.org/project/somde.


Author(s):  
Crescenzio Gallo

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter, the authors examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.


2009 ◽  
Vol 07 (04) ◽  
pp. 645-661 ◽  
Author(s):  
XIN CHEN

There is an increasing interest in clustering time course gene expression data to investigate a wide range of biological processes. However, developing a clustering algorithm ideal for time course gene express data is still challenging. As timing is an important factor in defining true clusters, a clustering algorithm shall explore expression correlations between time points in order to achieve a high clustering accuracy. Moreover, inter-cluster gene relationships are often desired in order to facilitate the computational inference of biological pathways and regulatory networks. In this paper, a new clustering algorithm called CurveSOM is developed to offer both features above. It first presents each gene by a cubic smoothing spline fitted to the time course expression profile, and then groups genes into clusters by applying a self-organizing map-based clustering on the resulting splines. CurveSOM has been tested on three well-studied yeast cell cycle datasets, and compared with four popular programs including Cluster 3.0, GENECLUSTER, MCLUST, and SSClust. The results show that CurveSOM is a very promising tool for the exploratory analysis of time course expression data, as it is not only able to group genes into clusters with high accuracy but also able to find true time-shifted correlations of expression patterns across clusters.


2015 ◽  
Vol 25 (6) ◽  
pp. 1000-1009 ◽  
Author(s):  
Reem Abdallah ◽  
Hye Sook Chon ◽  
Nadim Bou Zgheib ◽  
Douglas C. Marchion ◽  
Robert M. Wenham ◽  
...  

ObjectivesCytoreductive surgery is the cornerstone of ovarian cancer (OVCA) treatment. Detractors of initial maximal surgical effort argue that aggressive tumor biology will dictate survival, not the surgical effort. We investigated the role of biology in achieving optimal cytoreduction in serous OVCA using microarray gene expression analysis.MethodsFor the initial model, we used a gene expression signature from a microarray expression analysis of 124 women with serous OVCA, defining optimal cytoreduction as removal of all disease greater than 1 cm (with 64 women having optimal and 60 suboptimal cytoreduction). We then applied this model to 2 independent data sets: the Australian Ovarian Cancer Study (AOCS; 190 samples) and The Cancer Genome Atlas (TCGA; 468 samples). We performed a second analysis, defining optimal cytoreduction as removal of all disease to microscopic residual, using data from AOCS to create the gene signature and validating results in TCGA data set.ResultsOf the 12,718 genes included in the initial analysis, 58 predicted accuracy of cytoreductive surgery 69% of the time (P= 0.005). The performance of this classifier, measured by the area under the receiver operating characteristic curve, was 73%. When applied to TCGA and AOCS, accuracy was 56% (P= 0.16) and 62% (P= 0.01), respectively, with performance at 57% and 65%, respectively. In the second analysis, 220 genes predicted accuracy of cytoreductive surgery in the AOCS set 74% of the time, with performance of 73%. When these results were validated in TCGA set, accuracy was 57% (P= 0.31) and performance was at 62%.ConclusionGene expression data, used as a proxy of tumor biology, do not predict accurately nor consistently the ability to perform optimal cytoreductive surgery. Other factors, including surgical effort, may also explain part of the model. Additional studies integrating more biological and clinical data may improve the prediction model.


2015 ◽  
Vol 140 (6) ◽  
pp. 536-542 ◽  
Author(s):  
Hawazin Faruki ◽  
Gregory M. Mayhew ◽  
Cheng Fan ◽  
Matthew D. Wilkerson ◽  
Scott Parker ◽  
...  

Context A histologic classification of lung cancer subtypes is essential in guiding therapeutic management. Objective To complement morphology-based classification of lung tumors, a previously developed lung subtyping panel (LSP) of 57 genes was tested using multiple public fresh-frozen gene-expression data sets and a prospectively collected set of formalin-fixed, paraffin-embedded lung tumor samples. Design The LSP gene-expression signature was evaluated in multiple lung cancer gene-expression data sets totaling 2177 patients collected from 4 platforms: Illumina RNAseq (San Diego, California), Agilent (Santa Clara, California) and Affymetrix (Santa Clara) microarrays, and quantitative reverse transcription–polymerase chain reaction. Gene centroids were calculated for each of 3 genomic-defined subtypes: adenocarcinoma, squamous cell carcinoma, and neuroendocrine, the latter of which encompassed both small cell carcinoma and carcinoid. Classification by LSP into 3 subtypes was evaluated in both fresh-frozen and formalin-fixed, paraffin-embedded tumor samples, and agreement with the original morphology-based diagnosis was determined. Results The LSP-based classifications demonstrated overall agreement with the original clinical diagnosis ranging from 78% (251 of 322) to 91% (492 of 538 and 869 of 951) in the fresh-frozen public data sets and 84% (65 of 77) in the formalin-fixed, paraffin-embedded data set. The LSP performance was independent of tissue-preservation method and gene-expression platform. Secondary, blinded pathology review of formalin-fixed, paraffin-embedded samples demonstrated concordance of 82% (63 of 77) with the original morphology diagnosis. Conclusions The LSP gene-expression signature is a reproducible and objective method for classifying lung tumors and demonstrates good concordance with morphology-based classification across multiple data sets. The LSP panel can supplement morphologic assessment of lung cancers, particularly when classification by standard methods is challenging.


2017 ◽  
Author(s):  
Ionas Erb ◽  
Thomas Quinn ◽  
David Lovell ◽  
Cedric Notredame

AbstractGene expression data, such as those generated by next generation sequencing technologies (RNA-seq), are of an inherently relative nature: the total number of sequenced reads has no biological meaning. This issue is most often addressed with various normalization techniques which all face the same problem: once information about the total mRNA content of the origin cells is lost, it cannot be recovered by mere technical means. Additional knowledge, in the form of an unchanged reference, is necessary; however, this reference can usually only be estimated. Here we propose a novel method where sample normalization is unnecessary, but important insights can be obtained nevertheless. Instead of trying to recover absolute abundances, our method is entirely based on ratios, so normalization factors cancel by default. Although the differential expression of individual genes cannot be recovered this way, the ratios themselves can be differentially expressed (even when their constituents are not). Yet, most current analyses are blind to these cases, while our approach reveals them directly. Specifically, we show how the differential expression of gene ratios can be formalized by decomposing log-ratio variance (LRV) and deriving intuitive statistics from it. Although small LRVs have been used to detect proportional genes in gene expression data before, we focus here on the change in proportionality factors between groups of samples (e.g. tissue-specific proportionality). For this, we propose a statistic that is equivalent to the squared t-statistic of one-way ANOVA, but for gene ratios. In doing so, we show how precision weights can be incorporated to account for the peculiarities of count data, and, moreover, how a moderated statistic can be derived in the same way as the one following from a hierarchical model for individual genes. We also discuss approaches to deal with zero counts, deriving an expression of our statistic that is able to incorporate them. In providing a detailed analysis of the connections between the differential expression of genes and the differential proportionality of pairs, we facilitate a clear interpretation of new concepts. The proposed framework is applied to a data set from GTEx consisting of 98 samples from the cerebellum and cortex, with selected examples shown. A computationally efficient implementation of the approach in R has been released as an addendum to the propr package.1


Sign in / Sign up

Export Citation Format

Share Document