A new GRASP metaheuristic for biclustering of gene expression data

10.7287/peerj.preprints.1679v1 ◽

2016 ◽

Author(s):

Daniele Ferone ◽

Angelo Facchiano ◽

Anna Marabotti ◽

Paola Festa

Keyword(s):

Gene Expression ◽

Local Search ◽

Gene Expression Data ◽

Spanning Trees ◽

Complete Solution ◽

Optimal Solution ◽

Biological Data ◽

Data Matrix ◽

Expression Data ◽

Local Search Procedure

The term biclustering stands for simultaneous clustering of both genes and conditions. This task has generated considerable interest over the past few decades, particularly related to the analysis of high-dimensional gene expression data in information retrieval, knowledge discovery, and data mining [1]. Since the problem has been shown to be NP-complete, we have recently designed and implemented a GRASP metaheuristic [2,3,4]. The greedy criterion used in the construction phase uses the Euclidean distance to build spanning trees of the graph representing the input data matrix. Once obtained a complete solution, the local search procedure tries to both enlarge the current solution and to improve its H-score exchanging rows and columns. The proposed approach has been tested on 5 synthetic datasets [5]: 1) constant biclusters; 2) constant, upregulated biclusters; 3) shift-scale biclusters; 4) shift biclusters, and 5) scale biclusters. Compared with state-of-the-art competitors, its behaviour is excellent on shift datasets and is very good on all other datasets except for scaled ones. In order to improve its behaviour on scaled data as well and to reduce running times, we have designed and preliminarily tested a variant of the existing GRASP, whose local search phase returns an approximate local optimal solution. The resulting algorithm promises to be a more efficient, general, and robust method for the biclustering of all kinds of possible biological data.

Download Full-text

Evolutionary Local Search Algorithm for the biclustering of gene expression data based on biological knowledge

Applied Soft Computing ◽

10.1016/j.asoc.2021.107177 ◽

2021 ◽

Vol 104 ◽

pp. 107177

Author(s):

Ons Maâtouk ◽

Wassim Ayadi ◽

Hend Bouziri ◽

Béatrice Duval

Keyword(s):

Gene Expression ◽

Local Search ◽

Gene Expression Data ◽

Search Algorithm ◽

Biological Knowledge ◽

Expression Data ◽

Local Search Algorithm

Download Full-text

Graph Convolutional Network for Drug Response Prediction Using Gene Expression Data

Mathematics ◽

10.3390/math9070772 ◽

2021 ◽

Vol 9 (7) ◽

pp. 772

Author(s):

Seonghun Kim ◽

Seockhun Bae ◽

Yinhua Piao ◽

Kyuri Jo

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Drug Response ◽

Response Prediction ◽

Biological Data ◽

Expression Data ◽

Convolutional Network ◽

Essential Information ◽

Protein Protein Interaction

Genomic profiles of cancer patients such as gene expression have become a major source to predict responses to drugs in the era of personalized medicine. As large-scale drug screening data with cancer cell lines are available, a number of computational methods have been developed for drug response prediction. However, few methods incorporate both gene expression data and the biological network, which can harbor essential information about the underlying process of the drug response. We proposed an analysis framework called DrugGCN for prediction of Drug response using a Graph Convolutional Network (GCN). DrugGCN first generates a gene graph by combining a Protein-Protein Interaction (PPI) network and gene expression data with feature selection of drug-related genes, and the GCN model detects the local features such as subnetworks of genes that contribute to the drug response by localized filtering. We demonstrated the effectiveness of DrugGCN using biological data showing its high prediction accuracy among the competing methods.

Download Full-text

Network Completion for Static Gene Expression Data

Advances in Bioinformatics ◽

10.1155/2014/382452 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Natsu Nakajima ◽

Tatsuya Akutsu

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Synthetic Data ◽

Optimal Solution ◽

Genetic Networks ◽

Expression Data ◽

Initial Network ◽

Static Data ◽

Normal Gene ◽

Stationary Conditions

We tackle the problem of completing and inferring genetic networks under stationary conditions from static data, where network completion is to make the minimum amount of modifications to an initial network so that the completed network is most consistent with the expression data in which addition of edges and deletion of edges are basic modification operations. For this problem, we present a new method for network completion using dynamic programming and least-squares fitting. This method can find an optimal solution in polynomial time if the maximum indegree of the network is bounded by a constant. We evaluate the effectiveness of our method through computational experiments using synthetic data. Furthermore, we demonstrate that our proposed method can distinguish the differences between two types of genetic networks under stationary conditions from lung cancer and normal gene expression data.

Download Full-text

HisCoM-PAGE: Hierarchical Structural Component Models for Pathway Analysis of Gene Expression Data

Genes ◽

10.3390/genes10110931 ◽

2019 ◽

Vol 10 (11) ◽

pp. 931 ◽

Cited By ~ 4

Author(s):

Mok ◽

Kim ◽

Lee ◽

Choi ◽

Lee ◽

...

Keyword(s):

Gene Expression ◽

Pancreatic Cancer ◽

Gene Expression Data ◽

Pathway Analysis ◽

Structural Component ◽

Biological Data ◽

Gene Set Enrichment Analysis ◽

Expression Data ◽

Global Test ◽

Causal Pathways

Although there have been several analyses for identifying cancer-associated pathways, based on gene expression data, most of these are based on single pathway analyses, and thus do not consider correlations between pathways. In this paper, we propose a hierarchical structural component model for pathway analysis of gene expression data (HisCoM-PAGE), which accounts for the hierarchical structure of genes and pathways, as well as the correlations among pathways. Specifically, HisCoM-PAGE focuses on the survival phenotype and identifies its associated pathways. Moreover, its application to real biological data analysis of pancreatic cancer data demonstrated that HisCoM-PAGE could successfully identify pathways associated with pancreatic cancer prognosis. Simulation studies comparing the performance of HisCoM-PAGE with other competing methods such as Gene Set Enrichment Analysis (GSEA), Global Test, and Wald-type Test showed HisCoM-PAGE to have the highest power to detect causal pathways in most simulation scenarios.

Download Full-text

Clustering Genes Using Heterogeneous Data Sources

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/jkdb.2010040102 ◽

2010 ◽

Vol 1 (2) ◽

pp. 12-28 ◽

Cited By ~ 3

Author(s):

Erliang Zeng ◽

Chengyong Yang ◽

Tao Li ◽

Giri Narasimhan

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Incomplete Data ◽

Clustering Algorithm ◽

Biological Data ◽

Exploratory Analysis ◽

Data Sources ◽

Modular Organization ◽

Constrained Clustering ◽

Expression Data

Clustering of gene expression data is a standard exploratory technique used to identify closely related genes. Many other sources of data are also likely to be of great assistance in the analysis of gene expression data. This data provides a mean to begin elucidating the large-scale modular organization of the cell. The authors consider the challenging task of developing exploratory analytical techniques to deal with multiple complete and incomplete information sources. The Multi-Source Clustering (MSC) algorithm developed performs clustering with multiple, but complete, sources of data. To deal with incomplete data sources, the authors adopted the MPCK-means clustering algorithms to perform exploratory analysis on one complete source and other potentially incomplete sources provided in the form of constraints. This paper presents a new clustering algorithm MSC to perform exploratory analysis using two or more diverse but complete data sources, studies the effectiveness of constraints sets and robustness of the constrained clustering algorithm using multiple sources of incomplete biological data, and incorporates such incomplete data into constrained clustering algorithm in form of constraints sets.

Download Full-text

A Repeated Local Search Algorithm for BiClustering of Gene Expression Data

Similarity-Based Pattern Recognition - Lecture Notes in Computer Science ◽

10.1007/978-3-642-39140-8_19 ◽

2013 ◽

pp. 281-296 ◽

Cited By ~ 2

Author(s):

Duy Tin Truong ◽

Roberto Battiti ◽

Mauro Brunato

Keyword(s):

Gene Expression ◽

Local Search ◽

Gene Expression Data ◽

Search Algorithm ◽

Expression Data ◽

Local Search Algorithm

Download Full-text

Gene Expression Data Matrix

10.1007/springerreference_35281 ◽

2011 ◽

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Data Matrix ◽

Expression Data

Download Full-text

A Novel Biomarker Identification Approach for Gastric Cancer Using Gene Expression and DNA Methylation Dataset

Frontiers in Genetics ◽

10.3389/fgene.2021.644378 ◽

2021 ◽

Vol 12 ◽

Author(s):

Ge Zhang ◽

Zijing Xue ◽

Chaokun Yan ◽

Jianlin Wang ◽

Huimin Luo

Keyword(s):

Gene Expression ◽

Gastric Cancer ◽

Dna Methylation ◽

Feature Selection ◽

Gene Expression Data ◽

Complex Disease ◽

Biological Data ◽

Computational Method ◽

Superior Performance ◽

Expression Data

As one type of complex disease, gastric cancer has high mortality rate, and there are few effective treatments for patients in advanced stage. With the development of biological technology, a large amount of multiple-omics data of gastric cancer are generated, which enables computational method to discover potential biomarkers of gastric cancer. That will be very important to detect gastric cancer at earlier stages and thus assist in providing timely treatment. However, most of biological data have the characteristics of high dimension and low sample size. It is hard to process directly without feature selection. Besides, only using some omic data, such as gene expression data, provides limited evidence to investigate gastric cancer associated biomarkers. In this research, gene expression data and DNA methylation data are integrated to analyze gastric cancer, and a feature selection approach is proposed to identify the possible biomarkers of gastric cancer. After the original data are pre-processed, the mutual information (MI) is applied to select some top genes. Then, fold change (FC) and T-test are adopted to identify differentially expressed genes (DEG). In particular, false discover rate (FDR) is introduced to revise p_value to further screen genes. For chosen genes, a deep neural network (DNN) model is utilized as the classifier to measure the quality of classification. The experimental results show that the approach can achieve superior performance in terms of accuracy and other metrics. Biological analysis for chosen genes further validates the effectiveness of the approach.

Download Full-text

ReCodLiver0.9: Overcoming challenges in genome-scale metabolic reconstruction of a non-model species

10.1101/2020.06.23.162792 ◽

2020 ◽

Cited By ~ 1

Author(s):

Eileen Marie Hanna ◽

Xiaokang Zhang ◽

Marta Eide ◽

Shirin Fallahi ◽

Tomasz Furmanek ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gadus Morhua ◽

Metabolic Response ◽

Atlantic Cod ◽

Biological Data ◽

Environmental Toxicants ◽

Expression Data ◽

Model Species ◽

Genome Scale

AbstractThe availability of genome sequences, annotations and knowledge of the biochemistry underlying metabolic transformations has led to the generation of metabolic network reconstructions for a wide range of organisms in bacteria, archaea, and eukaryotes. When modeled using mathematical representations, a reconstruction can simulate underlying genotype-phenotype relationships. Accordingly, genome-scale models (GEMs) can be used to predict the response of organisms to genetic and environmental variations. A bottom-up reconstruction procedure typically starts by generating a draft model from existing annotation data on a target organism. For model species, this part of the process can be straightforward, due to the abundant organism-specific biochemical data. However, the process becomes complicated for non-model less-annotated species. In this paper, we present a draft liver reconstruction, ReCodLiver0.9, of Atlantic cod (Gadus morhua), a non-model teleost fish, as a practicable guide for cases with comparably few resources. Although the reconstruction is considered a draft version, we show that it already has utility in elucidating metabolic response mechanisms to environmental toxicants by mapping gene expression data of exposure experiments to the resulting model.Author summaryGenome-scale metabolic models (GEMs) are constructed based upon reconstructed networks that are carried out by an organism. The underlying biochemical knowledge in such networks can be transformed into mathematical models that could serve as a platform to answer biological questions. The availability of high-throughput biological data, including genomics, proteomics, and metabolomics data, supports the generation of such models for a large number of organisms. Nevertheless, challenges arise for non-model species which are typically less annotated. In this paper, we discuss these challenges and possible solutions in the context of generation of a draft liver reconstruction of Atlantic cod (Gadus morhua). We also show how experimental data, here gene expression data, can be mapped to the resulting model to understand the metabolic response of cod liver to environmental toxicants.

Download Full-text