scholarly journals Efficient Proximal Gradient Algorithm for Inference of Differential Gene Networks

2018 ◽  
Author(s):  
Chen Wang ◽  
Feng Gao ◽  
Georgios B. Giannakis ◽  
Gennaro D’Urso ◽  
Xiaodong Cai

AbstractBackgroundGene networks in living cells can change depending on various conditions such as caused by different environments, tissue types, disease states, and development stages. Identifying the differential changes in gene networks is very important to understand molecular basis of various biological process. While existing algorithms can be used to infer two gene networks separately from gene expression data under two different conditions, and then to identify network changes, such an approach does not exploit the data jointly, and it is thus suboptimal. A desirable approach would be clearly to infer two gene networks jointly, which can yield improved estimates of network changes.ResultsIn this paper, we developed a proximal gradient algorithm for differential network (ProGAdNet) inference, that jointly infers two gene networks under different conditions and then identifies changes in the network structure. Computer simulations demonstrated that our ProGAdNet outperformed existing algorithms in terms of inference accuracy, and was much faster than a similar approach for joint inference of gene networks. Gene expression data of breast tumors and normal tissues in the TCGA database were analyzed with our ProGAdNet, and revealed that 268 genes were involved in the changed network edges. Gene set enrichment analysis of this set of 268 genes identified a number of gene sets related to breast cancer or other types of cancer, which corroborated the gene set identified by ProGAdNet was very informative about the cancer disease status. A software package implementing the ProGAdNet and computer simulations is available upon request.ConclusionWith its superior performance over existing algorithms, ProGAdNet provides a valuable tool for finding changes in gene networks, which may aid the discovery of gene-gene interactions changed under different conditions.

2020 ◽  
Vol 15 ◽  
Author(s):  
Chen-An Tsai ◽  
James J. Chen

Background: Gene set enrichment analyses (GSEA) provide a useful and powerful approach to identify differentially expressed gene sets with prior biological knowledge. Several GSEA algorithms have been proposed to perform enrichment analyses on groups of genes. However, many of these algorithms have focused on identification of differentially expressed gene sets in a given phenotype. Objective: In this paper, we propose a gene set analytic framework, Gene Set Correlation Analysis (GSCoA), that simultaneously measures within and between gene sets variation to identify sets of genes enriched for differential expression and highly co-related pathways. Methods: We apply co-inertia analysis to the comparisons of cross-gene sets in gene expression data to measure the costructure of expression profiles in pairs of gene sets. Co-inertia analysis (CIA) is one multivariate method to identify trends or co-relationships in multiple datasets, which contain the same samples. The objective of CIA is to seek ordinations (dimension reduction diagrams) of two gene sets such that the square covariance between the projections of the gene sets on successive axes is maximized. Simulation studies illustrate that CIA offers superior performance in identifying corelationships between gene sets in all simulation settings when compared to correlation-based gene set methods. Result and Conclusion: We also combine between-gene set CIA and GSEA to discover the relationships between gene sets significantly associated with phenotypes. In addition, we provide a graphical technique for visualizing and simultaneously exploring the associations of between and within gene sets and their interaction and network. We then demonstrate integration of within and between gene sets variation using CIA and GSEA, applied to the p53 gene expression data using the c2 curated gene sets. Ultimately, the GSCoA approach provides an attractive tool for identification and visualization of novel associations between pairs of gene sets by integrating co-relationships between gene sets into gene set analysis.


2014 ◽  
Vol 13s1 ◽  
pp. CIN.S13882 ◽  
Author(s):  
Binghuang Cai ◽  
Xia Jiang

Analyzing biological system abnormalities in cancer patients based on measures of biological entities, such as gene expression levels, is an important and challenging problem. This paper applies existing methods, Gene Set Enrichment Analysis and Signaling Pathway Impact Analysis, to pathway abnormality analysis in lung cancer using microarray gene expression data. Gene expression data from studies of Lung Squamous Cell Carcinoma (LUSC) in The Cancer Genome Atlas project, and pathway gene set data from the Kyoto Encyclopedia of Genes and Genomes were used to analyze the relationship between pathways and phenotypes. Results, in the form of pathway rankings, indicate that some pathways may behave abnormally in LUSC. For example, both the cell cycle and viral carcinogenesis pathways ranked very high in LUSC. Furthermore, some pathways that are known to be associated with cancer, such as the p53 and the PI3K-Akt signal transduction pathways, were found to rank high in LUSC. Other pathways, such as bladder cancer and thyroid cancer pathways, were also ranked high in LUSC.


Author(s):  
Guro Dørum ◽  
Lars Snipen ◽  
Margrete Solheim ◽  
Solve Saebo

Gene set analysis methods have become a widely used tool for including prior biological knowledge in the statistical analysis of gene expression data. Advantages of these methods include increased sensitivity, easier interpretation and more conformity in the results. However, gene set methods do not employ all the available information about gene relations. Genes are arranged in complex networks where the network distances contain detailed information about inter-gene dependencies. We propose a method that uses gene networks to smooth gene expression data with the aim of reducing the number of false positives and identify important subnetworks. Gene dependencies are extracted from the network topology and are used to smooth genewise test statistics. To find the optimal degree of smoothing, we propose using a criterion that considers the correlation between the network and the data. The network smoothing is shown to improve the ability to identify important genes in simulated data. Applied to a real data set, the smoothing accentuates parts of the network with a high density of differentially expressed genes.


2016 ◽  
Author(s):  
Gennady Korotkevich ◽  
Vladimir Sukhov ◽  
Alexey Sergushichev

AbstractPreranked gene set enrichment analysis (GSEA) is a widely used method for interpretation of gene expression data in terms of biological processes. Here we present FGSEA method that is able to estimate arbitrarily low GSEA P-values with a higher accuracy and much faster compared to other implementations. We also present a polynomial algorithm to calculate GSEA P-values exactly, which we use to practically confirm the accuracy of the method.


Author(s):  
Crescenzio Gallo

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter, the authors examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Yuanyuan Li ◽  
David M. Umbach ◽  
Adrienna Bingham ◽  
Qi-Jing Li ◽  
Yuan Zhuang ◽  
...  

Abstract Background Tumor purity is the percent of cancer cells present in a sample of tumor tissue. The non-cancerous cells (immune cells, fibroblasts, etc.) have an important role in tumor biology. The ability to determine tumor purity is important to understand the roles of cancerous and non-cancerous cells in a tumor. Methods We applied a supervised machine learning method, XGBoost, to data from 33 TCGA tumor types to predict tumor purity using RNA-seq gene expression data. Results Across the 33 tumor types, the median correlation between observed and predicted tumor-purity ranged from 0.75 to 0.87 with small root mean square errors, suggesting that tumor purity can be accurately predicted υσινγ expression data. We further confirmed that expression levels of a ten-gene set (CSF2RB, RHOH, C1S, CCDC69, CCL22, CYTIP, POU2AF1, FGR, CCL21, and IL7R) were predictive of tumor purity regardless of tumor type. We tested whether our set of ten genes could accurately predict tumor purity of a TCGA-independent data set. We showed that expression levels from our set of ten genes were highly correlated (ρ = 0.88) with the actual observed tumor purity. Conclusions Our analyses suggested that the ten-gene set may serve as a biomarker for tumor purity prediction using gene expression data.


2015 ◽  
Vol 13 (06) ◽  
pp. 1550019 ◽  
Author(s):  
Alexei A. Sharov ◽  
David Schlessinger ◽  
Minoru S. H. Ko

We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users’ own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher’s methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein–protein interaction) are pre-loaded and can be used for functional annotations.


Sign in / Sign up

Export Citation Format

Share Document