Gene Ontology semantic similarity tools: survey on features and challenges for biological knowledge discovery

Successful applications of the gene ontology to the inference of functional relationships between gene products in recent years have raised the need for computational methods to automatically calculate semantic similarity between gene products based on semantic similarity of gene ontology terms. Nevertheless, existing methods, though having been widely used in a variety of applications, may significantly overestimate semantic similarity between genes that are actually not functionally related, thereby yielding misleading results in applications. To overcome this limitation, we propose to represent a gene product as a vector that is composed of information contents of gene ontology terms annotated for the gene product, and we suggest calculating similarity between two gene products as the relatedness of their corresponding vectors using three measures: Pearson’s correlation coefficient, cosine similarity, and the Jaccard index. We focus on the biological process domain of the gene ontology and annotations of yeast proteins to study the effectiveness of the proposed measures. Results show that semantic similarity scores calculated using the proposed measures are more consistent with known biological knowledge than those derived using a list of existing methods, suggesting the effectiveness of our method in characterizing functional relationships between gene products.

Download Full-text

Word and Sentence Embedding Tools to Measure Semantic Similarity of Gene Ontology Terms by Their Definitions

Journal of Computational Biology ◽

10.1089/cmb.2018.0093 ◽

2019 ◽

Vol 26 (1) ◽

pp. 38-52 ◽

Cited By ~ 3

Author(s):

Dat Duong ◽

Wasi Uddin Ahmad ◽

Eleazar Eskin ◽

Kai-Wei Chang ◽

Jingyi Jessica Li

Keyword(s):

Gene Ontology ◽

Semantic Similarity

Download Full-text

An integrated information-based similarity measurement of gene ontology terms

Computer Science and Information Systems ◽

10.2298/csis141130053z ◽

2015 ◽

Vol 12 (4) ◽

pp. 1235-1253 ◽

Cited By ~ 1

Author(s):

Shu-Bo Zhang ◽

Jian-Huang Lai

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

Semantic Information ◽

Gene Expression Dataset ◽

Similarity Measurement ◽

Depth Information ◽

Go Terms ◽

Validation Experiments ◽

Integrated Information ◽

Common Ancestors

Measuring the semantic similarity between pairs of terms in Gene Ontology (GO) can help to compare genes that can not be compared by other computational methods. In this study, we proposed an integrated information-based similarity measurement (IISM) to calculate the semantic similarity between two GO terms by taking into account multiple common ancestors that they share, and aggregating the semantic information and depth information of the non-redundant common ancestors. Our method searches for non-redundant common ancestors in an effective way. Validation experiments were conducted on both gene expression dataset and pathway dataset, and the experimental results suggest the superiority of our method against some existing methods.

Download Full-text

Simwos: Improving Semantic Similarity Between Gene Ontology Terms Based On Pfam Clans And Pathway Analysis

International Journal of Pharmaceutical Research ◽

10.31838/ijpr/2020.12.04.598 ◽

2020 ◽

Vol 12 (04) ◽

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

Pathway Analysis

Download Full-text

Influence of the go-based semantic similarity measures in multi-objective gene clustering algorithm performance

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720020500389 ◽

2020 ◽

Vol 18 (06) ◽

pp. 2050038

Author(s):

Jorge Parraga-Alava ◽

Mario Inostroza-Ponta

Keyword(s):

Semantic Similarity ◽

Clustering Algorithm ◽

Performance Metrics ◽

Expression Patterns ◽

Biological Significance ◽

Similarity Measures ◽

Gene Clustering ◽

Biological Knowledge ◽

Multi Objective ◽

Gene Similarity

Using a prior biological knowledge of relationships and genetic functions for gene similarity, from repository such as the Gene Ontology (GO), has shown good results in multi-objective gene clustering algorithms. In this scenario and to obtain useful clustering results, it would be helpful to know which measure of biological similarity between genes should be employed to yield meaningful clusters that have both similar expression patterns (co-expression) and biological homogeneity. In this paper, we studied the influence of the four most used GO-based semantic similarity measures in the performance of a multi-objective gene clustering algorithm. We used four publicly available datasets and carried out comparative studies based on performance metrics for the multi-objective optimization field and clustering performance indexes. In most of the cases, using Jiang–Conrath and Wang similarities stand in terms of multi-objective metrics. In clustering properties, Resnik similarity allows to achieve the best values of compactness and separation and therefore of co-expression of groups of genes. Meanwhile, in biological homogeneity, the Wang similarity reports greater number of significant GO terms. However, statistical, visual, and biological significance tests showed that none of the GO-based semantic similarity measures stand out above the rest in order to significantly improve the performance of the multi-objective gene clustering algorithm.

Download Full-text

GSAn: an alternative to enrichment analysis for annotating gene sets

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa017 ◽

2020 ◽

Vol 2 (2) ◽

Cited By ~ 5

Author(s):

Aaron Ayllon-Benitez ◽

Romain Bourqui ◽

Patricia Thébault ◽

Fleur Mougin

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

A Priori ◽

Similarity Measures ◽

Enrichment Analysis ◽

Biological Information ◽

Underlying Structure ◽

Gene Set ◽

Sequencing Technologies ◽

Gene Coverage

Abstract The revolution in new sequencing technologies is greatly leading to new understandings of the relations between genotype and phenotype. To interpret and analyze data that are grouped according to a phenotype of interest, methods based on statistical enrichment became a standard in biology. However, these methods synthesize the biological information by a priori selecting the over-represented terms and may suffer from focusing on the most studied genes that represent a limited coverage of annotated genes within a gene set. Semantic similarity measures have shown great results within the pairwise gene comparison by making advantage of the underlying structure of the Gene Ontology. We developed GSAn, a novel gene set annotation method that uses semantic similarity measures to synthesize a priori Gene Ontology annotation terms. The originality of our approach is to identify the best compromise between the number of retained annotation terms that has to be drastically reduced and the number of related genes that has to be as large as possible. Moreover, GSAn offers interactive visualization facilities dedicated to the multi-scale analysis of gene set annotations. Compared to enrichment analysis tools, GSAn has shown excellent results in terms of maximizing the gene coverage while minimizing the number of terms.

Download Full-text

A measure of semantic similarity between gene ontology terms based on semantic pathway covering*

Progress in Natural Science Materials International ◽

10.1080/10020070612330059 ◽

2006 ◽

Vol 16 (7) ◽

pp. 721-726 ◽

Cited By ~ 10

Author(s):

Li Rong ◽

Cao Shunliang ◽

Li Yuanyuan ◽

Tan Hao ◽

Zhu Yangyong ◽

...

Keyword(s):

Gene Ontology ◽

Semantic Similarity

Download Full-text

Investigating Correlation between Protein Sequence Similarity and Semantic Similarity Using Gene Ontology Annotations

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2017.2695542 ◽

2018 ◽

Vol 15 (3) ◽

pp. 905-912 ◽

Cited By ~ 1

Author(s):

Najmul Ikram ◽

Muhammad Abdul Qadir ◽

Muhammad Tanvir Afzal

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

Protein Sequence ◽

Sequence Similarity ◽

Protein Sequence Similarity

Download Full-text

Exploring information from the topology beneath the Gene Ontology terms to improve semantic similarity measures

Gene ◽

10.1016/j.gene.2016.04.024 ◽

2016 ◽

Vol 586 (1) ◽

pp. 148-157 ◽

Cited By ~ 3

Author(s):

Shu-Bo Zhang ◽

Jian-Huang Lai

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

Similarity Measures

Download Full-text

Predicting shrimp protein-protein interactions and gene ontology terms using association rule and semantic similarity calculation

2014 International Computer Science and Engineering Conference (ICSEC) ◽

10.1109/icsec.2014.6978208 ◽

2014 ◽

Author(s):

Sirintra Vaiwsri ◽

Anuphap Prachumwat ◽

Sudsanguan Ngamsuriyaroj ◽

Ananta Srisuphab

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

Protein Interactions ◽

Association Rule ◽

Protein Protein Interactions ◽

Similarity Calculation

Download Full-text