Gene Expression Data Matrix

A new GRASP metaheuristic for biclustering of gene expression data

10.7287/peerj.preprints.1679v1 ◽

2016 ◽

Author(s):

Daniele Ferone ◽

Angelo Facchiano ◽

Anna Marabotti ◽

Paola Festa

Keyword(s):

Gene Expression ◽

Local Search ◽

Gene Expression Data ◽

Spanning Trees ◽

Complete Solution ◽

Optimal Solution ◽

Biological Data ◽

Data Matrix ◽

Expression Data ◽

Local Search Procedure

The term biclustering stands for simultaneous clustering of both genes and conditions. This task has generated considerable interest over the past few decades, particularly related to the analysis of high-dimensional gene expression data in information retrieval, knowledge discovery, and data mining [1]. Since the problem has been shown to be NP-complete, we have recently designed and implemented a GRASP metaheuristic [2,3,4]. The greedy criterion used in the construction phase uses the Euclidean distance to build spanning trees of the graph representing the input data matrix. Once obtained a complete solution, the local search procedure tries to both enlarge the current solution and to improve its H-score exchanging rows and columns. The proposed approach has been tested on 5 synthetic datasets [5]: 1) constant biclusters; 2) constant, upregulated biclusters; 3) shift-scale biclusters; 4) shift biclusters, and 5) scale biclusters. Compared with state-of-the-art competitors, its behaviour is excellent on shift datasets and is very good on all other datasets except for scaled ones. In order to improve its behaviour on scaled data as well and to reduce running times, we have designed and preliminarily tested a variant of the existing GRASP, whose local search phase returns an approximate local optimal solution. The resulting algorithm promises to be a more efficient, general, and robust method for the biclustering of all kinds of possible biological data.

Download Full-text

A new GRASP metaheuristic for biclustering of gene expression data

10.7287/peerj.preprints.1679 ◽

2016 ◽

Author(s):

Daniele Ferone ◽

Angelo Facchiano ◽

Anna Marabotti ◽

Paola Festa

Keyword(s):

Gene Expression ◽

Local Search ◽

Gene Expression Data ◽

Spanning Trees ◽

Complete Solution ◽

Optimal Solution ◽

Biological Data ◽

Data Matrix ◽

Expression Data ◽

Local Search Procedure

The term biclustering stands for simultaneous clustering of both genes and conditions. This task has generated considerable interest over the past few decades, particularly related to the analysis of high-dimensional gene expression data in information retrieval, knowledge discovery, and data mining [1]. Since the problem has been shown to be NP-complete, we have recently designed and implemented a GRASP metaheuristic [2,3,4]. The greedy criterion used in the construction phase uses the Euclidean distance to build spanning trees of the graph representing the input data matrix. Once obtained a complete solution, the local search procedure tries to both enlarge the current solution and to improve its H-score exchanging rows and columns. The proposed approach has been tested on 5 synthetic datasets [5]: 1) constant biclusters; 2) constant, upregulated biclusters; 3) shift-scale biclusters; 4) shift biclusters, and 5) scale biclusters. Compared with state-of-the-art competitors, its behaviour is excellent on shift datasets and is very good on all other datasets except for scaled ones. In order to improve its behaviour on scaled data as well and to reduce running times, we have designed and preliminarily tested a variant of the existing GRASP, whose local search phase returns an approximate local optimal solution. The resulting algorithm promises to be a more efficient, general, and robust method for the biclustering of all kinds of possible biological data.

Download Full-text

ARBic: An All-Round Biclustering Algorithm for Analyzing Gene Expression Data

10.21203/rs.3.rs-936551/v1 ◽

2021 ◽

Author(s):

Xiangyu Liu ◽

Zhengchang Su ◽

Guojun Li

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Expression Patterns ◽

Data Matrix ◽

Expression Data ◽

Specific Expression ◽

Longest Path ◽

Background Data ◽

Effectiveness And Efficiency

Abstract Background: Identifying significant biclusters of genes with specific expression patterns is an effective approach to reveal functionally correlated genes in gene expression data. However, existing algorithms are limited to finding either broad or narrow biclusters but both due to failure of balancing between effectiveness and efficiency. Methods: We developed a new algorithm ARBic which can accurately identify any meaningful biclusters of shape no matter broad or narrow in a large scale gene expression data matrix, even when the values in the biclusters to be identified have the same distribution as that the background data has. ARBic is developed by integrating column-based and row-based strategies into biclustering procedure. The column-based strategy borrowed from ReBic, a recently published biclustering tool, prefers to narrow bicluters. The row-based strategy newly designed in this article by repeatedly finding a longest path in a specific directed graph prefers to broader ones. Result and Conclusion: When tested and compared to other seven salient biclustering algorithms on simulated datasets, ARBic achieved recovery, relevance and f1-scores 29% higher than the second best algorithm. Furthermore, ARBic substantially outperforms all of them on real datasets and robusts to noises, shapes of biclusters and types of datasets.Code: https://github.com/holyzews/ARBicData: https://doi.org/10.5281/zenodo.5121018

Download Full-text