graphsim: An R package for simulating gene expression data from graph structures of biological pathways

SummaryTranscriptomic analysis is used to capture the molecular state of a cell or sample in many biological and medical applications. In addition to identifying alterations in activity at the level of individual genes, understanding changes in the gene networks that regulate fundamental biological mechanisms is also an important objective of molecular analysis. As a result, databases that describe biological pathways are increasingly uesad to assist with the interpretation of results from large-scale genomics studies. Incorporating information from biological pathways and gene regulatory networks into a genomic data analysis is a popular strategy, and there are many methods that provide this functionality for gene expression data. When developing or comparing such methods, it is important to gain an accurate assessment of their performance. Simulation-based validation studies are frequently used for this. This necessitates the use of simulated data that correctly accounts for pathway relationships and correlations. Here we present a versatile statistical framework to simulate correlated gene expression data from biological pathways, by sampling from a multivariate normal distribution derived from a graph structure. This procedure has been released as the graphsim R package on CRAN and GitHub (https://github.com/TomKellyGenetics/graphsim) and is compatible with any graph structure that can be described using the igraph package. This package allows the simulation of biological pathways from a graph structure based on a statistical model of gene expression.

Download Full-text

Integrative Genomic Analysis for the Discovery of Biomarkers in Prostate Cancer

Biomarker Insights ◽

10.4137/bmi.s13729 ◽

2014 ◽

Vol 9 ◽

pp. BMI.S13729 ◽

Cited By ~ 5

Author(s):

Chindo Hicks ◽

Tejaswi Koganti ◽

Shankar Giri ◽

Memory Tekere ◽

Ritika Ramani ◽

...

Keyword(s):

Gene Expression ◽

Prostate Cancer ◽

Gene Expression Data ◽

Genetic Variants ◽

Association Studies ◽

Biological Pathways ◽

Great Success ◽

Genome Wide Association Studies ◽

Expression Data ◽

Increased Risk

Genome-wide association studies (GWAS) have achieved great success in identifying single nucleotide polymorphisms (SNPs, herein called genetic variants) and genes associated with risk of developing prostate cancer. However, GWAS do not typically link the genetic variants to the disease state or inform the broader context in which the genetic variants operate. Here, we present a novel integrative genomics approach that combines GWAS information with gene expression data to infer the causal association between gene expression and the disease and to identify the network states and biological pathways enriched for genetic variants. We identified gene regulatory networks and biological pathways enriched for genetic variants, including the prostate cancer, IGF-1, JAK2, androgen, and prolactin signaling pathways. The integration of GWAS information with gene expression data provides insights about the broader context in which genetic variants associated with an increased risk of developing prostate cancer operate.

Download Full-text

Space-log: a novel approach to inferring gene-gene net-works using SPACE model with log penalty

F1000Research ◽

10.12688/f1000research.26128.2 ◽

2022 ◽

Vol 9 ◽

pp. 1159

Author(s):

Qian (Vicky) Wu ◽

Wei Sun ◽

Li Hsu

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Networks ◽

Regulatory Networks ◽

Penalized Regression ◽

R Package ◽

Expression Data ◽

Computationally Efficient ◽

P Gene ◽

Novel Approach

Gene expression data have been used to infer gene-gene networks (GGN) where an edge between two genes implies the conditional dependence of these two genes given all the other genes. Such gene-gene networks are of-ten referred to as gene regulatory networks since it may reveal expression regulation. Most of existing methods for identifying GGN employ penalized regression with L1 (lasso), L2 (ridge), or elastic net penalty, which spans the range of L1 to L2 penalty. However, for high dimensional gene expression data, a penalty that spans the range of L0 and L1 penalty, such as the log penalty, is often needed for variable selection consistency. Thus, we develop a novel method that em-ploys log penalty within the framework of an earlier network identification method space (Sparse PArtial Correlation Estimation), and implement it into a R package space-log. We show that the space-log is computationally efficient (source code implemented in C), and has good performance comparing with other methods, particularly for networks with hubs.Space-log is open source and available at GitHub, https://github.com/wuqian77/SpaceLog

Download Full-text

NormExpression: an R package to normalize gene expression data using evaluated methods

10.1101/251140 ◽

2018 ◽

Cited By ~ 3

Author(s):

Zhenfeng Wu ◽

Weixiang Liu ◽

Xiufeng Jin ◽

Deshui Yu ◽

Hua Wang ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Analysis ◽

Gene Expression Analysis ◽

Expression Profiles ◽

R Package ◽

Data Normalization ◽

Expression Data ◽

Normalization Methods ◽

Normalize Gene Expression

AbstractData normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the current normalization methods, the different metrics yield inconsistent results. In this study, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods, achieving consistency in our evaluation results using both bulk RNA-seq and scRNA-seq data from the same library construction protocol. This consistency has validated the underlying theory that a sucessiful normalization method simultaneously maximizes the number of uniform genes and minimizes the correlation between the expression profiles of gene pairs. This consistency can also be used to analyze the quality of gene expression data. The gene expression data, normalization methods and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to evaluate methods (particularly some data-driven methods or their own methods) and then select a best one for data normalization in the gene expression analysis.

Download Full-text

Pathway Composite Variables: A Useful Tool for the Interpretation of Biological Pathways in the Analysis of Gene Expression Data

Advances in Latent Variables - Studies in Theoretical and Applied Statistics ◽

10.1007/10104_2014_22 ◽

2014 ◽

pp. 141-150 ◽

Cited By ~ 1

Author(s):

Daniele Pepe ◽

Mario Grassi

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Biological Pathways ◽

Expression Data ◽

Composite Variables

Download Full-text

NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods

Frontiers in Genetics ◽

10.3389/fgene.2019.00400 ◽

2019 ◽

Vol 10 ◽

Author(s):

Zhenfeng Wu ◽

Weixiang Liu ◽

Xiufeng Jin ◽

Haishuo Ji ◽

Hua Wang ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

R Package ◽

Expression Data ◽

Normalize Gene Expression

Download Full-text

Imputation of Gene Expression Data in Blood Cancer and Its Significance in Inferring Biological Pathways

Frontiers in Oncology ◽

10.3389/fonc.2019.01442 ◽

2020 ◽

Vol 9 ◽

Author(s):

Akanksha Farswan ◽

Anubha Gupta ◽

Ritu Gupta ◽

Gurvinder Kaur

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Biological Pathways ◽

Expression Data ◽

Blood Cancer

Download Full-text

sgnesR: An R package for simulating gene expression data from an underlying real gene network structure considering delay parameters

BMC Bioinformatics ◽

10.1186/s12859-017-1731-8 ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 6

Author(s):

Shailesh Tripathi ◽

Jason Lloyd-Price ◽

Andre Ribeiro ◽

Olli Yli-Harja ◽

Matthias Dehmer ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Network Structure ◽

Gene Network ◽

R Package ◽

Expression Data ◽

Real Gene

Download Full-text

MADE4: an R package for multivariate analysis of gene expression data

Bioinformatics ◽

10.1093/bioinformatics/bti394 ◽

2005 ◽

Vol 21 (11) ◽

pp. 2789-2790 ◽

Cited By ~ 227

Author(s):

A. C. Culhane ◽

J. Thioulouse ◽

G. Perriere ◽

D. G. Higgins

Keyword(s):

Gene Expression ◽

Multivariate Analysis ◽

Gene Expression Data ◽

R Package ◽

Expression Data

Download Full-text

Space-log: a novel approach to inferring gene-gene net-works using SPACE model with log penalty

F1000Research ◽

10.12688/f1000research.26128.1 ◽

2020 ◽

Vol 9 ◽

pp. 1159

Author(s):

Qian (Vicky) Wu ◽

Wei Sun ◽

Li Hsu

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Networks ◽

Regulatory Networks ◽

Penalized Regression ◽

R Package ◽

Expression Data ◽

Computationally Efficient ◽

P Gene ◽

Novel Approach

Gene expression data have been used to infer gene-gene networks (GGN) where an edge between two genes implies the conditional dependence of these two genes given all the other genes. Such gene-gene networks are of-ten referred to as gene regulatory networks since it may reveal expression regulation. Most of existing methods for identifying GGN employ penalized regression with L1 (lasso), L2 (ridge), or elastic net penalty, which spans the range of L1 to L2 penalty. However, for high dimensional gene expression data, a penalty that spans the range of L0 and L1 penalty, such as the log penalty, is often needed for variable selection consistency. Thus, we develop a novel method that em-ploys log penalty within the framework of an earlier network identification method space (Sparse PArtial Correlation Estimation), and implement it into a R package space-log. We show that the space-log is computationally efficient (source code implemented in C), and has good performance comparing with other methods, particularly for networks with hubs.Space-log is open source and available at GitHub, https://github.com/wuqian77/SpaceLog

Download Full-text