Efficient Proximal Gradient Algorithm for Inference of Differential Gene Networks

Mapping Intimacies ◽

10.1101/450130 ◽

2018 ◽

Author(s):

Chen Wang ◽

Feng Gao ◽

Georgios B. Giannakis ◽

Gennaro D’Urso ◽

Xiaodong Cai

Keyword(s):

Gene Expression ◽

Computer Simulations ◽

Gene Expression Data ◽

Gene Networks ◽

Gene Set Enrichment Analysis ◽

Gradient Algorithm ◽

Superior Performance ◽

Expression Data ◽

Gene Set ◽

Proximal Gradient Algorithm

AbstractBackgroundGene networks in living cells can change depending on various conditions such as caused by different environments, tissue types, disease states, and development stages. Identifying the differential changes in gene networks is very important to understand molecular basis of various biological process. While existing algorithms can be used to infer two gene networks separately from gene expression data under two different conditions, and then to identify network changes, such an approach does not exploit the data jointly, and it is thus suboptimal. A desirable approach would be clearly to infer two gene networks jointly, which can yield improved estimates of network changes.ResultsIn this paper, we developed a proximal gradient algorithm for differential network (ProGAdNet) inference, that jointly infers two gene networks under different conditions and then identifies changes in the network structure. Computer simulations demonstrated that our ProGAdNet outperformed existing algorithms in terms of inference accuracy, and was much faster than a similar approach for joint inference of gene networks. Gene expression data of breast tumors and normal tissues in the TCGA database were analyzed with our ProGAdNet, and revealed that 268 genes were involved in the changed network edges. Gene set enrichment analysis of this set of 268 genes identified a number of gene sets related to breast cancer or other types of cancer, which corroborated the gene set identified by ProGAdNet was very informative about the cancer disease status. A software package implementing the ProGAdNet and computer simulations is available upon request.ConclusionWith its superior performance over existing algorithms, ProGAdNet provides a valuable tool for finding changes in gene networks, which may aid the discovery of gene-gene interactions changed under different conditions.

Download Full-text

Gene Set Correlation Analysis and Visualization Using Gene Expression Data

Current Bioinformatics ◽

10.2174/1574893615999200629124444 ◽

2020 ◽

Vol 15 ◽

Author(s):

Chen-An Tsai ◽

James J. Chen

Keyword(s):

Gene Expression ◽

Correlation Analysis ◽

Gene Expression Data ◽

Differentially Expressed Gene ◽

Differentially Expressed ◽

Superior Performance ◽

Expression Data ◽

Gene Set ◽

Gene Sets ◽

Set Correlation

Background: Gene set enrichment analyses (GSEA) provide a useful and powerful approach to identify differentially expressed gene sets with prior biological knowledge. Several GSEA algorithms have been proposed to perform enrichment analyses on groups of genes. However, many of these algorithms have focused on identification of differentially expressed gene sets in a given phenotype. Objective: In this paper, we propose a gene set analytic framework, Gene Set Correlation Analysis (GSCoA), that simultaneously measures within and between gene sets variation to identify sets of genes enriched for differential expression and highly co-related pathways. Methods: We apply co-inertia analysis to the comparisons of cross-gene sets in gene expression data to measure the costructure of expression profiles in pairs of gene sets. Co-inertia analysis (CIA) is one multivariate method to identify trends or co-relationships in multiple datasets, which contain the same samples. The objective of CIA is to seek ordinations (dimension reduction diagrams) of two gene sets such that the square covariance between the projections of the gene sets on successive axes is maximized. Simulation studies illustrate that CIA offers superior performance in identifying corelationships between gene sets in all simulation settings when compared to correlation-based gene set methods. Result and Conclusion: We also combine between-gene set CIA and GSEA to discover the relationships between gene sets significantly associated with phenotypes. In addition, we provide a graphical technique for visualizing and simultaneously exploring the associations of between and within gene sets and their interaction and network. We then demonstrate integration of within and between gene sets variation using CIA and GSEA, applied to the p53 gene expression data using the c2 curated gene sets. Ultimately, the GSCoA approach provides an attractive tool for identification and visualization of novel associations between pairs of gene sets by integrating co-relationships between gene sets into gene set analysis.

Download Full-text

Revealing Biological Pathways Implicated in Lung Cancer from TCGA Gene Expression Data Using Gene Set Enrichment Analysis

Cancer Informatics ◽

10.4137/cin.s13882 ◽

2014 ◽

Vol 13s1 ◽

pp. CIN.S13882 ◽

Cited By ~ 4

Author(s):

Binghuang Cai ◽

Xia Jiang

Keyword(s):

Gene Expression ◽

Lung Cancer ◽

Gene Expression Data ◽

Lung Squamous Cell Carcinoma ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Expression Data ◽

Gene Set Enrichment ◽

Gene Set ◽

Pathway Gene

Analyzing biological system abnormalities in cancer patients based on measures of biological entities, such as gene expression levels, is an important and challenging problem. This paper applies existing methods, Gene Set Enrichment Analysis and Signaling Pathway Impact Analysis, to pathway abnormality analysis in lung cancer using microarray gene expression data. Gene expression data from studies of Lung Squamous Cell Carcinoma (LUSC) in The Cancer Genome Atlas project, and pathway gene set data from the Kyoto Encyclopedia of Genes and Genomes were used to analyze the relationship between pathways and phenotypes. Results, in the form of pathway rankings, indicate that some pathways may behave abnormally in LUSC. For example, both the cell cycle and viral carcinogenesis pathways ranked very high in LUSC. Furthermore, some pathways that are known to be associated with cancer, such as the p53 and the PI3K-Akt signal transduction pathways, were found to rank high in LUSC. Other pathways, such as bladder cancer and thyroid cancer pathways, were also ranked high in LUSC.

Download Full-text

Smoothing Gene Expression Data with Network Information Improves Consistency of Regulated Genes

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1618 ◽

2011 ◽

Vol 10 (1) ◽

Cited By ~ 6

Author(s):

Guro Dørum ◽

Lars Snipen ◽

Margrete Solheim ◽

Solve Saebo

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Networks ◽

Simulated Data ◽

Real Data ◽

Biological Knowledge ◽

Expression Data ◽

Data Set ◽

Gene Set ◽

Network Information

Gene set analysis methods have become a widely used tool for including prior biological knowledge in the statistical analysis of gene expression data. Advantages of these methods include increased sensitivity, easier interpretation and more conformity in the results. However, gene set methods do not employ all the available information about gene relations. Genes are arranged in complex networks where the network distances contain detailed information about inter-gene dependencies. We propose a method that uses gene networks to smooth gene expression data with the aim of reducing the number of false positives and identify important subnetworks. Gene dependencies are extracted from the network topology and are used to smooth genewise test statistics. To find the optimal degree of smoothing, we propose using a criterion that considers the correlation between the network and the data. The network smoothing is shown to improve the ability to identify important genes in simulated data. Applied to a real data set, the smoothing accentuates parts of the network with a high density of differentially expressed genes.

Download Full-text

Fast gene set enrichment analysis

10.1101/060012 ◽

2016 ◽

Cited By ~ 218

Author(s):

Gennady Korotkevich ◽

Vladimir Sukhov ◽

Alexey Sergushichev

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Polynomial Algorithm ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Biological Processes ◽

Expression Data ◽

Gene Set Enrichment ◽

P Values ◽

Gene Set

AbstractPreranked gene set enrichment analysis (GSEA) is a widely used method for interpretation of gene expression data in terms of biological processes. Here we present FGSEA method that is able to estimate arbitrarily low GSEA P-values with a higher accuracy and much faster compared to other implementations. We also present a polynomial algorithm to calculate GSEA P-values exactly, which we use to practically confirm the accuracy of the method.

Download Full-text

Application of bi-clustering of gene expression data and gene set enrichment analysis methods to identify potentially disease causing nanomaterials

Data in Brief ◽

10.1016/j.dib.2017.10.060 ◽

2017 ◽

Vol 15 ◽

pp. 933-940 ◽

Cited By ~ 2

Author(s):

Andrew Williams ◽

Sabina Halappanavar

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Expression Data ◽

Gene Set Enrichment ◽

Gene Set ◽

Analysis Methods

Download Full-text

Bootstrapping Time-Course Gene Expression Data for Gene Networks: Application to Gene Relevance Networks

Journal of Computational Biology ◽

10.1089/cmb.2018.0029 ◽

2018 ◽

Vol 25 (12) ◽

pp. 1374-1384

Author(s):

Jeonifer M. Garren ◽

Jaejik Kim

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Networks ◽

Time Course ◽

Expression Data

Download Full-text

Reconstructing Gene Networks of Forest Trees from Gene Expression Data: Toward Higher-Resolution Approaches

Communications in Computer and Information Science - ICT Innovations 2018. Engineering and Life Sciences ◽

10.1007/978-3-030-00825-3_1 ◽

2018 ◽

pp. 3-12 ◽

Cited By ~ 1

Author(s):

Matt Zinkgraf ◽

Andrew Groover ◽

Vladimir Filkov

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Networks ◽

Forest Trees ◽

Expression Data

Download Full-text

Building Gene Networks by Analyzing Gene Expression Profiles

Advanced Methodologies and Technologies in Medicine and Healthcare - Advances in Medical Diagnosis, Treatment, and Care ◽

10.4018/978-1-5225-7489-7.ch003 ◽

2019 ◽

pp. 27-44

Author(s):

Crescenzio Gallo

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Networks ◽

Dna Microarrays ◽

Expression Profiles ◽

Expression Patterns ◽

Gene Expression Profiles ◽

Expression Data ◽

Gene Expressions ◽

Over Time

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter, the authors examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.

Download Full-text

Putative biomarkers for predicting tumor sample purity based on gene expression data

BMC Genomics ◽

10.1186/s12864-019-6412-8 ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Yuanyuan Li ◽

David M. Umbach ◽

Adrienna Bingham ◽

Qi-Jing Li ◽

Yuan Zhuang ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Supervised Machine Learning ◽

Tumor Type ◽

Expression Data ◽

Expression Levels ◽

Gene Set ◽

Tumor Purity ◽

Tumor Types ◽

Cancerous Cells

Abstract Background Tumor purity is the percent of cancer cells present in a sample of tumor tissue. The non-cancerous cells (immune cells, fibroblasts, etc.) have an important role in tumor biology. The ability to determine tumor purity is important to understand the roles of cancerous and non-cancerous cells in a tumor. Methods We applied a supervised machine learning method, XGBoost, to data from 33 TCGA tumor types to predict tumor purity using RNA-seq gene expression data. Results Across the 33 tumor types, the median correlation between observed and predicted tumor-purity ranged from 0.75 to 0.87 with small root mean square errors, suggesting that tumor purity can be accurately predicted υσινγ expression data. We further confirmed that expression levels of a ten-gene set (CSF2RB, RHOH, C1S, CCDC69, CCL22, CYTIP, POU2AF1, FGR, CCL21, and IL7R) were predictive of tumor purity regardless of tumor type. We tested whether our set of ten genes could accurately predict tumor purity of a TCGA-independent data set. We showed that expression levels from our set of ten genes were highly correlated (ρ = 0.88) with the actual observed tumor purity. Conclusions Our analyses suggested that the ten-gene set may serve as a biomarker for tumor purity prediction using gene expression data.

Download Full-text

ExAtlas: An interactive online tool for meta-analysis of gene expression data

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720015500195 ◽

2015 ◽

Vol 13 (06) ◽

pp. 1550019 ◽

Cited By ~ 37

Author(s):

Alexei A. Sharov ◽

David Schlessinger ◽

Minoru S. H. Ko

Keyword(s):

Gene Expression ◽

Gene Ontology ◽

Gene Expression Data ◽

Fixed Effects ◽

Expression Profiles ◽

Meta Analysis ◽

Data Sets ◽

Expression Data ◽

Gene Set ◽

Public Data

We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users’ own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher’s methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein–protein interaction) are pre-loaded and can be used for functional annotations.

Download Full-text