scholarly journals A new GRASP metaheuristic for biclustering of gene expression data

Author(s):  
Daniele Ferone ◽  
Angelo Facchiano ◽  
Anna Marabotti ◽  
Paola Festa

The term biclustering stands for simultaneous clustering of both genes and conditions. This task has generated considerable interest over the past few decades, particularly related to the analysis of high-dimensional gene expression data in information retrieval, knowledge discovery, and data mining [1]. Since the problem has been shown to be NP-complete, we have recently designed and implemented a GRASP metaheuristic [2,3,4]. The greedy criterion used in the construction phase uses the Euclidean distance to build spanning trees of the graph representing the input data matrix. Once obtained a complete solution, the local search procedure tries to both enlarge the current solution and to improve its H-score exchanging rows and columns. The proposed approach has been tested on 5 synthetic datasets [5]: 1) constant biclusters; 2) constant, upregulated biclusters; 3) shift-scale biclusters; 4) shift biclusters, and 5) scale biclusters. Compared with state-of-the-art competitors, its behaviour is excellent on shift datasets and is very good on all other datasets except for scaled ones. In order to improve its behaviour on scaled data as well and to reduce running times, we have designed and preliminarily tested a variant of the existing GRASP, whose local search phase returns an approximate local optimal solution. The resulting algorithm promises to be a more efficient, general, and robust method for the biclustering of all kinds of possible biological data.

2016 ◽  
Author(s):  
Daniele Ferone ◽  
Angelo Facchiano ◽  
Anna Marabotti ◽  
Paola Festa

The term biclustering stands for simultaneous clustering of both genes and conditions. This task has generated considerable interest over the past few decades, particularly related to the analysis of high-dimensional gene expression data in information retrieval, knowledge discovery, and data mining [1]. Since the problem has been shown to be NP-complete, we have recently designed and implemented a GRASP metaheuristic [2,3,4]. The greedy criterion used in the construction phase uses the Euclidean distance to build spanning trees of the graph representing the input data matrix. Once obtained a complete solution, the local search procedure tries to both enlarge the current solution and to improve its H-score exchanging rows and columns. The proposed approach has been tested on 5 synthetic datasets [5]: 1) constant biclusters; 2) constant, upregulated biclusters; 3) shift-scale biclusters; 4) shift biclusters, and 5) scale biclusters. Compared with state-of-the-art competitors, its behaviour is excellent on shift datasets and is very good on all other datasets except for scaled ones. In order to improve its behaviour on scaled data as well and to reduce running times, we have designed and preliminarily tested a variant of the existing GRASP, whose local search phase returns an approximate local optimal solution. The resulting algorithm promises to be a more efficient, general, and robust method for the biclustering of all kinds of possible biological data.


Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 772
Author(s):  
Seonghun Kim ◽  
Seockhun Bae ◽  
Yinhua Piao ◽  
Kyuri Jo

Genomic profiles of cancer patients such as gene expression have become a major source to predict responses to drugs in the era of personalized medicine. As large-scale drug screening data with cancer cell lines are available, a number of computational methods have been developed for drug response prediction. However, few methods incorporate both gene expression data and the biological network, which can harbor essential information about the underlying process of the drug response. We proposed an analysis framework called DrugGCN for prediction of Drug response using a Graph Convolutional Network (GCN). DrugGCN first generates a gene graph by combining a Protein-Protein Interaction (PPI) network and gene expression data with feature selection of drug-related genes, and the GCN model detects the local features such as subnetworks of genes that contribute to the drug response by localized filtering. We demonstrated the effectiveness of DrugGCN using biological data showing its high prediction accuracy among the competing methods.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Natsu Nakajima ◽  
Tatsuya Akutsu

We tackle the problem of completing and inferring genetic networks under stationary conditions from static data, where network completion is to make the minimum amount of modifications to an initial network so that the completed network is most consistent with the expression data in which addition of edges and deletion of edges are basic modification operations. For this problem, we present a new method for network completion using dynamic programming and least-squares fitting. This method can find an optimal solution in polynomial time if the maximum indegree of the network is bounded by a constant. We evaluate the effectiveness of our method through computational experiments using synthetic data. Furthermore, we demonstrate that our proposed method can distinguish the differences between two types of genetic networks under stationary conditions from lung cancer and normal gene expression data.


Genes ◽  
2019 ◽  
Vol 10 (11) ◽  
pp. 931 ◽  
Author(s):  
Mok ◽  
Kim ◽  
Lee ◽  
Choi ◽  
Lee ◽  
...  

Although there have been several analyses for identifying cancer-associated pathways, based on gene expression data, most of these are based on single pathway analyses, and thus do not consider correlations between pathways. In this paper, we propose a hierarchical structural component model for pathway analysis of gene expression data (HisCoM-PAGE), which accounts for the hierarchical structure of genes and pathways, as well as the correlations among pathways. Specifically, HisCoM-PAGE focuses on the survival phenotype and identifies its associated pathways. Moreover, its application to real biological data analysis of pancreatic cancer data demonstrated that HisCoM-PAGE could successfully identify pathways associated with pancreatic cancer prognosis. Simulation studies comparing the performance of HisCoM-PAGE with other competing methods such as Gene Set Enrichment Analysis (GSEA), Global Test, and Wald-type Test showed HisCoM-PAGE to have the highest power to detect causal pathways in most simulation scenarios.


Author(s):  
Erliang Zeng ◽  
Chengyong Yang ◽  
Tao Li ◽  
Giri Narasimhan

Clustering of gene expression data is a standard exploratory technique used to identify closely related genes. Many other sources of data are also likely to be of great assistance in the analysis of gene expression data. This data provides a mean to begin elucidating the large-scale modular organization of the cell. The authors consider the challenging task of developing exploratory analytical techniques to deal with multiple complete and incomplete information sources. The Multi-Source Clustering (MSC) algorithm developed performs clustering with multiple, but complete, sources of data. To deal with incomplete data sources, the authors adopted the MPCK-means clustering algorithms to perform exploratory analysis on one complete source and other potentially incomplete sources provided in the form of constraints. This paper presents a new clustering algorithm MSC to perform exploratory analysis using two or more diverse but complete data sources, studies the effectiveness of constraints sets and robustness of the constrained clustering algorithm using multiple sources of incomplete biological data, and incorporates such incomplete data into constrained clustering algorithm in form of constraints sets.


2021 ◽  
Vol 12 ◽  
Author(s):  
Ge Zhang ◽  
Zijing Xue ◽  
Chaokun Yan ◽  
Jianlin Wang ◽  
Huimin Luo

As one type of complex disease, gastric cancer has high mortality rate, and there are few effective treatments for patients in advanced stage. With the development of biological technology, a large amount of multiple-omics data of gastric cancer are generated, which enables computational method to discover potential biomarkers of gastric cancer. That will be very important to detect gastric cancer at earlier stages and thus assist in providing timely treatment. However, most of biological data have the characteristics of high dimension and low sample size. It is hard to process directly without feature selection. Besides, only using some omic data, such as gene expression data, provides limited evidence to investigate gastric cancer associated biomarkers. In this research, gene expression data and DNA methylation data are integrated to analyze gastric cancer, and a feature selection approach is proposed to identify the possible biomarkers of gastric cancer. After the original data are pre-processed, the mutual information (MI) is applied to select some top genes. Then, fold change (FC) and T-test are adopted to identify differentially expressed genes (DEG). In particular, false discover rate (FDR) is introduced to revise p_value to further screen genes. For chosen genes, a deep neural network (DNN) model is utilized as the classifier to measure the quality of classification. The experimental results show that the approach can achieve superior performance in terms of accuracy and other metrics. Biological analysis for chosen genes further validates the effectiveness of the approach.


Author(s):  
Eileen Marie Hanna ◽  
Xiaokang Zhang ◽  
Marta Eide ◽  
Shirin Fallahi ◽  
Tomasz Furmanek ◽  
...  

AbstractThe availability of genome sequences, annotations and knowledge of the biochemistry underlying metabolic transformations has led to the generation of metabolic network reconstructions for a wide range of organisms in bacteria, archaea, and eukaryotes. When modeled using mathematical representations, a reconstruction can simulate underlying genotype-phenotype relationships. Accordingly, genome-scale models (GEMs) can be used to predict the response of organisms to genetic and environmental variations. A bottom-up reconstruction procedure typically starts by generating a draft model from existing annotation data on a target organism. For model species, this part of the process can be straightforward, due to the abundant organism-specific biochemical data. However, the process becomes complicated for non-model less-annotated species. In this paper, we present a draft liver reconstruction, ReCodLiver0.9, of Atlantic cod (Gadus morhua), a non-model teleost fish, as a practicable guide for cases with comparably few resources. Although the reconstruction is considered a draft version, we show that it already has utility in elucidating metabolic response mechanisms to environmental toxicants by mapping gene expression data of exposure experiments to the resulting model.Author summaryGenome-scale metabolic models (GEMs) are constructed based upon reconstructed networks that are carried out by an organism. The underlying biochemical knowledge in such networks can be transformed into mathematical models that could serve as a platform to answer biological questions. The availability of high-throughput biological data, including genomics, proteomics, and metabolomics data, supports the generation of such models for a large number of organisms. Nevertheless, challenges arise for non-model species which are typically less annotated. In this paper, we discuss these challenges and possible solutions in the context of generation of a draft liver reconstruction of Atlantic cod (Gadus morhua). We also show how experimental data, here gene expression data, can be mapped to the resulting model to understand the metabolic response of cod liver to environmental toxicants.


Sign in / Sign up

Export Citation Format

Share Document