Network Completion for Static Gene Expression Data

This paper provides a brief introduction to learning Bayesian networks from gene-expression data. The method is contrasted with other approaches to the reverse engineering of biochemical networks, and the Bayesian learning paradigm is briefly described. The article demonstrates an application to a simple synthetic toy problem and evaluates the inference performance in terms of ROC (receiver operator characteristic) curves.

Download Full-text

Genetic networks inferred from time series of gene expression data

First International Symposium on Control, Communications and Signal Processing, 2004. ◽

10.1109/isccsp.2004.1296523 ◽

2004 ◽

Cited By ~ 7

Author(s):

I. Tabus ◽

C.D. Giurcaneanu ◽

J. Astola

Keyword(s):

Gene Expression ◽

Time Series ◽

Gene Expression Data ◽

Genetic Networks ◽

Expression Data

Download Full-text

Adversarial generation of gene expression data

Bioinformatics ◽

10.1093/bioinformatics/btab035 ◽

2021 ◽

Author(s):

Ramon Viñas ◽

Helena Andrés-Terré ◽

Pietro Liò ◽

Kevin Bryson

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Synthetic Data ◽

Gene Clusters ◽

Supplementary Information ◽

Expression Data ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Wide Range ◽

Transcriptomics Data

Abstract Motivation High-throughput gene expression can be used to address a wide range of fundamental biological problems, but datasets of an appropriate size are often unavailable. Moreover, existing transcriptomics simulators have been criticized because they fail to emulate key properties of gene expression data. In this article, we develop a method based on a conditional generative adversarial network to generate realistic transcriptomics data for Escherichia coli and humans. We assess the performance of our approach across several tissues and cancer-types. Results We show that our model preserves several gene expression properties significantly better than widely used simulators, such as SynTReN or GeneNetWeaver. The synthetic data preserve tissue- and cancer-specific properties of transcriptomics data. Moreover, it exhibits real gene clusters and ontologies both at local and global scales, suggesting that the model learns to approximate the gene expression manifold in a biologically meaningful way. Availability and implementation Code is available at: https://github.com/rvinas/adversarial-gene-expression. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A new GRASP metaheuristic for biclustering of gene expression data

10.7287/peerj.preprints.1679v1 ◽

2016 ◽

Author(s):

Daniele Ferone ◽

Angelo Facchiano ◽

Anna Marabotti ◽

Paola Festa

Keyword(s):

Gene Expression ◽

Local Search ◽

Gene Expression Data ◽

Spanning Trees ◽

Complete Solution ◽

Optimal Solution ◽

Biological Data ◽

Data Matrix ◽

Expression Data ◽

Local Search Procedure

The term biclustering stands for simultaneous clustering of both genes and conditions. This task has generated considerable interest over the past few decades, particularly related to the analysis of high-dimensional gene expression data in information retrieval, knowledge discovery, and data mining [1]. Since the problem has been shown to be NP-complete, we have recently designed and implemented a GRASP metaheuristic [2,3,4]. The greedy criterion used in the construction phase uses the Euclidean distance to build spanning trees of the graph representing the input data matrix. Once obtained a complete solution, the local search procedure tries to both enlarge the current solution and to improve its H-score exchanging rows and columns. The proposed approach has been tested on 5 synthetic datasets [5]: 1) constant biclusters; 2) constant, upregulated biclusters; 3) shift-scale biclusters; 4) shift biclusters, and 5) scale biclusters. Compared with state-of-the-art competitors, its behaviour is excellent on shift datasets and is very good on all other datasets except for scaled ones. In order to improve its behaviour on scaled data as well and to reduce running times, we have designed and preliminarily tested a variant of the existing GRASP, whose local search phase returns an approximate local optimal solution. The resulting algorithm promises to be a more efficient, general, and robust method for the biclustering of all kinds of possible biological data.

Download Full-text

[Regular Paper] Inference of Genetic Networks Using Random Forests: Use of Different Weights for Time-Series and Static Gene Expression Data

2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE) ◽

10.1109/bibe.2018.00026 ◽

2018 ◽

Cited By ~ 1

Author(s):

Shuhei Kimura ◽

Masato Tokuhisa ◽

Mariko Okada-Hatakeyama

Keyword(s):

Gene Expression ◽

Time Series ◽

Gene Expression Data ◽

Random Forests ◽

Genetic Networks ◽

Expression Data ◽

Regular Paper

Download Full-text

A new GRASP metaheuristic for biclustering of gene expression data

10.7287/peerj.preprints.1679 ◽

2016 ◽

Author(s):

Daniele Ferone ◽

Angelo Facchiano ◽

Anna Marabotti ◽

Paola Festa

Keyword(s):

Gene Expression ◽

Local Search ◽

Gene Expression Data ◽

Spanning Trees ◽

Complete Solution ◽

Optimal Solution ◽

Biological Data ◽

Data Matrix ◽

Expression Data ◽

Local Search Procedure

The term biclustering stands for simultaneous clustering of both genes and conditions. This task has generated considerable interest over the past few decades, particularly related to the analysis of high-dimensional gene expression data in information retrieval, knowledge discovery, and data mining [1]. Since the problem has been shown to be NP-complete, we have recently designed and implemented a GRASP metaheuristic [2,3,4]. The greedy criterion used in the construction phase uses the Euclidean distance to build spanning trees of the graph representing the input data matrix. Once obtained a complete solution, the local search procedure tries to both enlarge the current solution and to improve its H-score exchanging rows and columns. The proposed approach has been tested on 5 synthetic datasets [5]: 1) constant biclusters; 2) constant, upregulated biclusters; 3) shift-scale biclusters; 4) shift biclusters, and 5) scale biclusters. Compared with state-of-the-art competitors, its behaviour is excellent on shift datasets and is very good on all other datasets except for scaled ones. In order to improve its behaviour on scaled data as well and to reduce running times, we have designed and preliminarily tested a variant of the existing GRASP, whose local search phase returns an approximate local optimal solution. The resulting algorithm promises to be a more efficient, general, and robust method for the biclustering of all kinds of possible biological data.

Download Full-text

The covariance shift (C-SHIFT) algorithm for normalizing biological data

10.1101/2020.04.13.038463 ◽

2020 ◽

Author(s):

Evgenia Chunikhina ◽

Paul Logan ◽

Yevgeniy Kovchegov ◽

Anatoly Yambartsev ◽

Debashis Mondal ◽

...

Keyword(s):

Gene Expression ◽

Covariance Matrix ◽

Gene Expression Data ◽

Synthetic Data ◽

Biological Data ◽

Optimization Techniques ◽

Expression Data ◽

Absolute Deviation ◽

Normalization Methods ◽

Gene Network Analysis

AbstractOmics technologies are powerful tools for analyzing patterns in gene expression data for thousands of genes. Due to a number of systematic variations in experiments, the raw gene expression data is often obfuscated by undesirable technical noises. Various normalization techniques were designed in an attempt to remove these non-biological errors prior to any statistical analysis. One of the reasons for normalizing data is the need for recovering the covariance matrix used in gene network analysis. In this paper, we introduce a novel normalization technique, called the covariance shift (C-SHIFT) method. This normalization algorithm uses optimization techniques together with the blessing of dimensionality philosophy and energy minimization hypothesis for covariance matrix recovery under additive noise (in biology, known as the bias). Thus, it is perfectly suited for the analysis of logarithmic gene expression data. Numerical experiments on synthetic data demonstrate the method’s advantage over the classical normalization techniques. Namely, the comparison is made with rank, quantile, cyclic LOESS (locally estimated scatterplot smoothing), and MAD (median absolute deviation) normalization methods.

Download Full-text

Cancer Genetic Network Inference Using Gaussian Graphical Models

Bioinformatics and Biology Insights ◽

10.1177/1177932219839402 ◽

2019 ◽

Vol 13 ◽

pp. 117793221983940 ◽

Cited By ~ 7

Author(s):

Haitao Zhao ◽

Zhong-Hui Duan

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Human Cancer ◽

Genetic Network ◽

Genetic Networks ◽

Gene Interactions ◽

Expression Data ◽

Rna Seq ◽

Cancer Genetic ◽

Graphical Lasso

The Cancer Genome Atlas (TCGA) provides a rich resource that can be used to understand how genes interact in cancer cells and has collected RNA-Seq gene expression data for many types of human cancer. However, mining the data to uncover the hidden gene-interaction patterns remains a challenge. Gaussian graphical model (GGM) is often used to learn genetic networks because it defines an undirected graphical structure, revealing the conditional dependences of genes. In this study, we focus on inferring gene interactions in 15 specific types of human cancer using RNA-Seq expression data and GGM with graphical lasso. We take advantage of the corresponding Kyoto Encyclopedia of Genes and Genomes pathway maps to define the subsets of related genes. RNA-Seq expression levels of the subsets of genes in solid cancerous tumor and normal tissues were extracted from TCGA. The gene expression data sets were cleaned and formatted, and the genetic network corresponding to each cancer type was then inferred using GGM with graphical lasso. The inferred networks reveal stable conditional dependences among the genes at the expression level and confirm the essential roles played by the genes that encode proteins involved in the two key signaling pathway phosphoinositide 3-kinase (PI3K)/AKT/mTOR and Ras/Raf/MEK/ERK in human carcinogenesis. These stable dependences elucidate the expression level interactions among the genes that are implicated in many different human cancers. The inferred genetic networks were examined to further identify and characterize a collection of gene interactions that are unique to cancer. The cross-cancer genetic interactions revealed from our study provide another set of knowledge for cancer biologists to propose strong hypotheses, so further biological investigations can be conducted effectively.

Download Full-text

Ensemble dependence model for classification and prediction of cancer and normal gene expression data

Bioinformatics ◽

10.1093/bioinformatics/bti483 ◽

2005 ◽

Vol 21 (14) ◽

pp. 3114-3121 ◽

Cited By ~ 24

Author(s):

P. Qiu ◽

Z. J. Wang ◽

K. J. R. Liu

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Data ◽

Normal Gene

Download Full-text

Cancer Classification from Gene Expression data using Fuzzy-Rough techniques An Empirical Study

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i6.415420 ◽

2018 ◽

Vol 6 (6) ◽

pp. 415-420

Author(s):

Ansuman Kumar ◽

Anindya Halder

Keyword(s):

Gene Expression ◽

Empirical Study ◽

Gene Expression Data ◽

Cancer Classification ◽

Expression Data

Download Full-text