Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data

Mapping Intimacies ◽

10.1101/221309 ◽

2017 ◽

Cited By ~ 1

Author(s):

Basel Abu-Jamous ◽

Steven Kelly

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Enrichment Analysis ◽

Gene Clusters ◽

Error Rates ◽

Model Organisms ◽

Expression Data ◽

Clustering Methods ◽

Automated Method

AbstractIdentification of co-expressed gene clusters can provide evidence for genetic or physical interactions between genes. Thus, co-expression clustering is a routine step in large-scale analyses of gene expression data. We show that commonly used clustering methods produce results that substantially disagree with each other, and do not match the biological expectations of co-expressed gene clusters. Furthermore, these clusters can contain up to 50% unreliably assigned genes. Consequently, downstream analyses of these clusters (e.g. functional term enrichment analysis) suffer from high error rates. We present clust, an automated method that solves these problems by extracting clusters that match the biological expectations of co-expressed genes. Using 100 datasets from five model organisms we demonstrate that clusters generated by clust are better than those produced by other methods, both numerically and for use in functional analysis. Finally, we show that clust can simultaneously cluster multiple datasets, enabling users to leverage the large quantity of public expression data for novel comparative analysis.

Download Full-text

Validation of Hierarchical Gene Clusters Using Repeated Measurements

Jurnal Teknologi ◽

10.11113/jt.v61.1616 ◽

2013 ◽

Vol 61 (1) ◽

Author(s):

Lim Fong Tee ◽

Mohd Saberi Mohamad ◽

Safaai Deris ◽

Ahmad ‘Athif Mohd Faudzi ◽

Muhammad Shafie Abd Latiff ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Clusters ◽

Microarray Gene Expression Data ◽

Repeated Measurement ◽

Repeated Measurements ◽

Expression Data ◽

Clustering Methods ◽

Microarray Gene Expression ◽

The Stability

Hierarchical clustering is an unsupervised technique, which is a common approach to study protein and gene expression data. In clustering, the patterns of expression of different genes are grouped into distinct clusters, in which the genes in the same cluster are assumed potential to be functionally related or to be influenced by a common upstream factor. Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, the uncertainty in the results obtained is still bothersome. Experimental repetitions are generally performed to overcome the drawbacks of biological variability and technical variability. In this study, the author proposes repeated measurement to evaluate the stability of gene clusters. This paper aims to prove that the stability from the gene clusters, incorporated with repeated measurement, can be used for further analysis.

Download Full-text

Graph Convolutional Network for Drug Response Prediction Using Gene Expression Data

Mathematics ◽

10.3390/math9070772 ◽

2021 ◽

Vol 9 (7) ◽

pp. 772

Author(s):

Seonghun Kim ◽

Seockhun Bae ◽

Yinhua Piao ◽

Kyuri Jo

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Drug Response ◽

Response Prediction ◽

Biological Data ◽

Expression Data ◽

Convolutional Network ◽

Essential Information ◽

Protein Protein Interaction

Genomic profiles of cancer patients such as gene expression have become a major source to predict responses to drugs in the era of personalized medicine. As large-scale drug screening data with cancer cell lines are available, a number of computational methods have been developed for drug response prediction. However, few methods incorporate both gene expression data and the biological network, which can harbor essential information about the underlying process of the drug response. We proposed an analysis framework called DrugGCN for prediction of Drug response using a Graph Convolutional Network (GCN). DrugGCN first generates a gene graph by combining a Protein-Protein Interaction (PPI) network and gene expression data with feature selection of drug-related genes, and the GCN model detects the local features such as subnetworks of genes that contribute to the drug response by localized filtering. We demonstrated the effectiveness of DrugGCN using biological data showing its high prediction accuracy among the competing methods.

Download Full-text

GENE DISCOVERY METHODS FROM LARGE-SCALE GENE EXPRESSION DATA

Quantum Bio-Informatics III ◽

10.1142/9789814304061_0040 ◽

2010 ◽

Author(s):

AKIFUMI SHIMIZU ◽

KENTARO YANO

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Gene Discovery ◽

Expression Data

Download Full-text

Pathways Enrichment Analysis of Gene Expression Data in Type 2 Diabetes

Methods in Molecular Biology - Type 2 Diabetes ◽

10.1007/978-1-4939-9882-1_7 ◽

2019 ◽

pp. 119-128

Author(s):

Maysson Ibrahim

Keyword(s):

Gene Expression ◽

Type 2 Diabetes ◽

Gene Expression Data ◽

Enrichment Analysis ◽

Expression Data

Download Full-text

LSTrAP-Crowd: Prediction of novel components of bacterial ribosomes with crowd-sourced analysis of RNA sequencing data

10.1101/2020.04.20.005249 ◽

2020 ◽

Author(s):

Benedict Hew ◽

Qiao Wen Tan ◽

William Goh ◽

Jonathan Wei Xiong Ng ◽

Kenny Koh ◽

...

Keyword(s):

Gene Expression ◽

Protein Synthesis ◽

Rna Sequencing ◽

Gene Expression Data ◽

Large Scale ◽

Bacterial Resistance ◽

Expression Data ◽

Sequencing Data ◽

Novel Proteins ◽

Novel Antibiotics

AbstractBacterial resistance to antibiotics is a growing problem that is projected to cause more deaths than cancer in 2050. Consequently, novel antibiotics are urgently needed. Since more than half of the available antibiotics target the bacterial ribosomes, proteins that are involved in protein synthesis are thus prime targets for the development of novel antibiotics. However, experimental identification of these potential antibiotic target proteins can be labor-intensive and challenging, as these proteins are likely to be poorly characterized and specific to few bacteria. In order to identify these novel proteins, we established a Large-Scale Transcriptomic Analysis Pipeline in Crowd (LSTrAP-Crowd), where 285 individuals processed 26 terabytes of RNA-sequencing data of the 17 most notorious bacterial pathogens. In total, the crowd processed 26,269 RNA-seq experiments and used the data to construct gene co-expression networks, which were used to identify more than a hundred uncharacterized genes that were transcriptionally associated with protein synthesis. We provide the identity of these genes together with the processed gene expression data. The data can be used to identify other vulnerabilities or bacteria, while our approach demonstrates how the processing of gene expression data can be easily crowdsourced.

Download Full-text

Defining transcription modules using large-scale gene expression data

Bioinformatics ◽

10.1093/bioinformatics/bth166 ◽

2004 ◽

Vol 20 (13) ◽

pp. 1993-2003 ◽

Cited By ~ 216

Author(s):

J. Ihmels ◽

S. Bergmann ◽

N. Barkai

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Expression Data

Download Full-text

Large-Scale Integration of MicroRNA and Gene Expression Data for Identification of Enriched MicroRNA–mRNA Associations in Biological Systems

Methods in Molecular Biology - MicroRNAs and the Immune System ◽

10.1007/978-1-60761-811-9_20 ◽

2010 ◽

pp. 297-315 ◽

Cited By ~ 28

Author(s):

Preethi H. Gunaratne ◽

Chad J. Creighton ◽

Michael Watson ◽

Jayantha B. Tennakoon

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Biological Systems ◽

Expression Data ◽

Large Scale Integration ◽

Scale Integration

Download Full-text

Analyzing Large Gene Expression Data Sets

Computational Text Analysis ◽

10.1093/oso/9780198567400.003.0014 ◽

2006 ◽

Author(s):

Soumya Raychaudhuri

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Analysis ◽

Gene Expression Analysis ◽

Data Sets ◽

Expression Data ◽

Clustering Methods ◽

Biologically Relevant ◽

Large Gene ◽

Functional Coherence

The most interesting and challenging gene expression data sets to analyze are large multidimensional data sets that contain expression values for many genes across multiple conditions. In these data sets the use of scientific text can be particularly useful, since there are a myriad of genes examined under vastly different conditions, each of which may induce or repress expression of the same gene for different reasons. There is an enormous complexity to the data that we are examining—each gene is associated with dozens if not hundreds of expression values as well as multiple documents built up from vocabularies consisting of thousands of words. In Section 2.4 we reviewed common gene expression strategies, most of which revolve around defining groups of genes based on common profiles. A limitation of many gene expression analytic approaches is that they do not incorporate comprehensive background knowledge about the genes into the analysis. We present computational methods that leverage the peer-reviewed literature in the automatic analysis of gene expression data sets. Including the literature in gene expression data analysis offers an opportunity to incorporate background functional information about the genes when defining expression clusters. In Chapter 5 we saw how literature- based approaches could help in the analysis of single condition experiments. Here we will apply the strategies introduced in Chapter 6 to assess the coherence of groups of genes to enhance gene expression analysis approaches. The methods proposed here could, in fact, be applied to any multivariate genomics data type. The key concepts discussed in this chapter are listed in the frame box. We begin with a discussion of gene groups and their role in expression analysis; we briefly discuss strategies to assign keywords to groups and strategies to assess their functional coherence. We apply functional coherence measures to gene expression analysis; for examples we focus on a yeast expression data set. We first demonstrate how functional coherence can be used to focus in on the key biologically relevant gene groups derived by clustering methods such as self-organizing maps and k-means clustering.

Download Full-text

Processing Large-Scale, High-Dimension Genetic and Gene Expression Data

Handbook on Analyzing Human Genetic Data ◽

10.1007/978-3-540-69264-5_11 ◽

2009 ◽

pp. 307-330

Author(s):

Cliona Molony ◽

Solveig K. Sieberts ◽

Eric E. Schadt

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

High Dimension ◽

Large Scale ◽

Expression Data

Download Full-text

Mining the Gene Expression Matrix: Inferring Gene Relationships from Large Scale Gene Expression Data

Information Processing in Cells and Tissues ◽

10.1007/978-1-4615-5345-8_22 ◽

1998 ◽

pp. 203-212 ◽

Cited By ~ 35

Author(s):

Patrik D’haeseleer ◽

Xiling Wen ◽

Stefanie Fuhrman ◽

Roland Somogyi

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Expression Data ◽

Gene Expression Matrix ◽

Expression Matrix

Download Full-text