Exploring information from the topology beneath the Gene Ontology terms to improve semantic similarity measures

Shu-Bo Zhang; Jian-Huang Lai

doi:10.1016/j.gene.2016.04.024

GSAn: an alternative to enrichment analysis for annotating gene sets

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa017 ◽

2020 ◽

Vol 2 (2) ◽

Cited By ~ 5

Author(s):

Aaron Ayllon-Benitez ◽

Romain Bourqui ◽

Patricia Thébault ◽

Fleur Mougin

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

A Priori ◽

Similarity Measures ◽

Enrichment Analysis ◽

Biological Information ◽

Underlying Structure ◽

Gene Set ◽

Sequencing Technologies ◽

Gene Coverage

Abstract The revolution in new sequencing technologies is greatly leading to new understandings of the relations between genotype and phenotype. To interpret and analyze data that are grouped according to a phenotype of interest, methods based on statistical enrichment became a standard in biology. However, these methods synthesize the biological information by a priori selecting the over-represented terms and may suffer from focusing on the most studied genes that represent a limited coverage of annotated genes within a gene set. Semantic similarity measures have shown great results within the pairwise gene comparison by making advantage of the underlying structure of the Gene Ontology. We developed GSAn, a novel gene set annotation method that uses semantic similarity measures to synthesize a priori Gene Ontology annotation terms. The originality of our approach is to identify the best compromise between the number of retained annotation terms that has to be drastically reduced and the number of related genes that has to be as large as possible. Moreover, GSAn offers interactive visualization facilities dedicated to the multi-scale analysis of gene set annotations. Compared to enrichment analysis tools, GSAn has shown excellent results in terms of maximizing the gene coverage while minimizing the number of terms.

Download Full-text

The use of semantic similarity measures for optimally integrating heterogeneous Gene Ontology data from large scale annotation pipelines

Frontiers in Genetics ◽

10.3389/fgene.2014.00264 ◽

2014 ◽

Vol 5 ◽

Cited By ~ 4

Author(s):

Gaston K. Mazandu ◽

Nicola J. Mulder

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

Large Scale ◽

Similarity Measures

Download Full-text

Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation

Bioinformatics ◽

10.1093/bioinformatics/btg153 ◽

2003 ◽

Vol 19 (10) ◽

pp. 1275-1283 ◽

Cited By ~ 523

Author(s):

P. W. Lord ◽

R. D. Stevens ◽

A. Brass ◽

C. A. Goble

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

Similarity Measures ◽

The Relationship

Download Full-text

Information Content-Based Gene Ontology Semantic Similarity Approaches: Toward a Unified Framework Theory

BioMed Research International ◽

10.1155/2013/292063 ◽

2013 ◽

Vol 2013 ◽

pp. 1-11 ◽

Cited By ~ 31

Author(s):

Gaston K. Mazandu ◽

Nicola J. Mulder

Keyword(s):

Gene Ontology ◽

Information Content ◽

Semantic Similarity ◽

Experimental Evaluation ◽

Similarity Measures ◽

Mathematical Framework ◽

Unified Framework ◽

The Impact ◽

Unified Description ◽

Similarity Scores

Several approaches have been proposed for computing term information content (IC) and semantic similarity scores within the gene ontology (GO) directed acyclic graph (DAG). These approaches contributed to improving protein analyses at the functional level. Considering the recent proliferation of these approaches, a unified theory in a well-defined mathematical framework is necessary in order to provide a theoretical basis for validating these approaches. We review the existing IC-based ontological similarity approaches developed in the context of biomedical and bioinformatics fields to propose a general framework and unified description of all these measures. We have conducted an experimental evaluation to assess the impact of IC approaches, different normalization models, and correction factors on the performance of a functional similarity metric. Results reveal that considering only parents or only children of terms when assessing information content or semantic similarity scores negatively impacts the approach under consideration. This study produces a unified framework for current and future GO semantic similarity measures and provides theoretical basics for comparing different approaches. The experimental evaluation of different approaches based on different term information content models paves the way towards a solution to the issue of scoring a term’s specificity in the GO DAG.

Download Full-text

Faculty Opinions recommendation of Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1020683.240601 ◽

2004 ◽

Author(s):

Golan Yona

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

Similarity Measures ◽

The Relationship

Download Full-text

SEMANTIC SIMILARITY MEASURES AS TOOLS FOR EXPLORING THE GENE ONTOLOGY

Biocomputing 2003 ◽

10.1142/9789812776303_0056 ◽

2002 ◽

Cited By ~ 10

Author(s):

P. W. LORD ◽

R. D. STEVENS ◽

A. BRASS ◽

C. A. GOBLE

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

Similarity Measures

Download Full-text

Faculty Opinions recommendation of Exploiting disjointness axioms to improve semantic similarity measures.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.722317980.793528331 ◽

2017 ◽

Author(s):

Sebastian Köhler

Keyword(s):

Semantic Similarity ◽

Similarity Measures

Download Full-text

Denoising distant supervision for ontology lexicalization using semantic similarity measures

Expert Systems with Applications ◽

10.1016/j.eswa.2021.114922 ◽

2021 ◽

Vol 177 ◽

pp. 114922

Author(s):

Mehdi Jabalameli ◽

Mohammadali Nematbakhsh ◽

Reza Ramezani

Keyword(s):

Semantic Similarity ◽

Similarity Measures ◽

Distant Supervision

Download Full-text

A New Family of Similarity Measures for Scoring Confidence of Protein Interactions using Gene Ontology

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2021.3083150 ◽

2021 ◽

pp. 1-1

Author(s):

Madhusudan Paul ◽

Ashish Anand

Keyword(s):

Gene Ontology ◽

Protein Interactions ◽

Similarity Measures ◽

New Family

Download Full-text

Evolution of Semantic Similarity—A Survey

ACM Computing Surveys ◽

10.1145/3440755 ◽

2021 ◽

Vol 54 (2) ◽

pp. 1-37

Author(s):

Dhivya Chandrasekaran ◽

Vijay Mago

Keyword(s):

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Hybrid Methods ◽

Research Work ◽

Similarity Measures ◽

Text Data ◽

Knowledge Based ◽

Open Research ◽

Research Problems

Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods for determining semantic similarity measures. To address this issue, various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods beginning from traditional NLP techniques such as kernel-based methods to the most recent research work on transformer-based models, categorizing them based on their underlying principles as knowledge-based, corpus-based, deep neural network–based methods, and hybrid methods. Discussing the strengths and weaknesses of each method, this survey provides a comprehensive view of existing systems in place for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity.

Download Full-text