The use of semantic similarity measures for optimally integrating heterogeneous Gene Ontology data from large scale annotation pipelines

GSAn: an alternative to enrichment analysis for annotating gene sets

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa017 ◽

2020 ◽

Vol 2 (2) ◽

Cited By ~ 5

Author(s):

Aaron Ayllon-Benitez ◽

Romain Bourqui ◽

Patricia Thébault ◽

Fleur Mougin

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

A Priori ◽

Similarity Measures ◽

Enrichment Analysis ◽

Biological Information ◽

Underlying Structure ◽

Gene Set ◽

Sequencing Technologies ◽

Gene Coverage

Abstract The revolution in new sequencing technologies is greatly leading to new understandings of the relations between genotype and phenotype. To interpret and analyze data that are grouped according to a phenotype of interest, methods based on statistical enrichment became a standard in biology. However, these methods synthesize the biological information by a priori selecting the over-represented terms and may suffer from focusing on the most studied genes that represent a limited coverage of annotated genes within a gene set. Semantic similarity measures have shown great results within the pairwise gene comparison by making advantage of the underlying structure of the Gene Ontology. We developed GSAn, a novel gene set annotation method that uses semantic similarity measures to synthesize a priori Gene Ontology annotation terms. The originality of our approach is to identify the best compromise between the number of retained annotation terms that has to be drastically reduced and the number of related genes that has to be as large as possible. Moreover, GSAn offers interactive visualization facilities dedicated to the multi-scale analysis of gene set annotations. Compared to enrichment analysis tools, GSAn has shown excellent results in terms of maximizing the gene coverage while minimizing the number of terms.

Download Full-text

Exploring information from the topology beneath the Gene Ontology terms to improve semantic similarity measures

Gene ◽

10.1016/j.gene.2016.04.024 ◽

2016 ◽

Vol 586 (1) ◽

pp. 148-157 ◽

Cited By ~ 3

Author(s):

Shu-Bo Zhang ◽

Jian-Huang Lai

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

Similarity Measures

Download Full-text

An Integrated Platform Supporting Semantic Similarity Score Calculation and Reproducibility

10.21203/rs.3.rs-806346/v1 ◽

2021 ◽

Author(s):

Gaston K. Mazandu ◽

Kenneth Opap ◽

Funmilayo Makinde ◽

Victoria Nembaware ◽

Francis Agamah ◽

...

Keyword(s):

Gene Ontology ◽

Knowledge Sharing ◽

Semantic Similarity ◽

Automated Reasoning ◽

Large Scale ◽

Essential Role ◽

Similarity Score ◽

File Format ◽

Flexible Tool ◽

Integrated Platform

Abstract During the last decade, we witnessed an exponential rise of datasets from heterogeneous sources. Ontologies are playing an essential role in consistently describing domain concepts, data harmonization and integration to support large-scale integrative analysis and semantic interoperability in knowledge sharing. Several semantic similarity (SS) measures have been suggested to enable the integration of rich ontology structures into automated reasoning and inference. However, there is no tool that exhaustively implements these measures and existing tools are generally Gene Ontology specific, do not implement several models suggested in the WordNet context and are not equipped to properly deal with frequent ontology updates. We introduce a Python SS measure library (PySML), which tackles issues related to current SS tools, providing a portable and expandable tool to a broad computational audience. This empowers users to manipulate SS scores from several applications for any ontology version and file format. PySML is a flexible tool enabling the implementation of all existing semantic similarity models, resolving issues related to computation, reproducibility and re-usability of SS scores.

Download Full-text

Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation

Bioinformatics ◽

10.1093/bioinformatics/btg153 ◽

2003 ◽

Vol 19 (10) ◽

pp. 1275-1283 ◽

Cited By ~ 523

Author(s):

P. W. Lord ◽

R. D. Stevens ◽

A. Brass ◽

C. A. Goble

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

Similarity Measures ◽

The Relationship

Download Full-text

Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms

BMC Bioinformatics ◽

10.1186/1471-2105-13-s4-s14 ◽

2012 ◽

Vol 13 (S4) ◽

Cited By ~ 89

Author(s):

Marco Falda ◽

Stefano Toppo ◽

Alessandro Pescarolo ◽

Enrico Lavezzo ◽

Barbara Di Camillo ◽

...

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

Large Scale ◽

Function Prediction ◽

Scale Function ◽

Prediction Tool

Download Full-text

Information Content-Based Gene Ontology Semantic Similarity Approaches: Toward a Unified Framework Theory

BioMed Research International ◽

10.1155/2013/292063 ◽

2013 ◽

Vol 2013 ◽

pp. 1-11 ◽

Cited By ~ 31

Author(s):

Gaston K. Mazandu ◽

Nicola J. Mulder

Keyword(s):

Gene Ontology ◽

Information Content ◽

Semantic Similarity ◽

Experimental Evaluation ◽

Similarity Measures ◽

Mathematical Framework ◽

Unified Framework ◽

The Impact ◽

Unified Description ◽

Similarity Scores

Several approaches have been proposed for computing term information content (IC) and semantic similarity scores within the gene ontology (GO) directed acyclic graph (DAG). These approaches contributed to improving protein analyses at the functional level. Considering the recent proliferation of these approaches, a unified theory in a well-defined mathematical framework is necessary in order to provide a theoretical basis for validating these approaches. We review the existing IC-based ontological similarity approaches developed in the context of biomedical and bioinformatics fields to propose a general framework and unified description of all these measures. We have conducted an experimental evaluation to assess the impact of IC approaches, different normalization models, and correction factors on the performance of a functional similarity metric. Results reveal that considering only parents or only children of terms when assessing information content or semantic similarity scores negatively impacts the approach under consideration. This study produces a unified framework for current and future GO semantic similarity measures and provides theoretical basics for comparing different approaches. The experimental evaluation of different approaches based on different term information content models paves the way towards a solution to the issue of scoring a term’s specificity in the GO DAG.

Download Full-text

Faculty Opinions recommendation of Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1020683.240601 ◽

2004 ◽

Author(s):

Golan Yona

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

Similarity Measures ◽

The Relationship

Download Full-text

A New Family of Similarity Measures for Scoring Confidence of Protein Interactions using Gene Ontology

10.1101/459107 ◽

2018 ◽

Cited By ~ 1

Author(s):

Madhusudan Paul ◽

Ashish Anand

Keyword(s):

Gene Ontology ◽

Protein Interactions ◽

Large Scale ◽

Similarity Measures ◽

Confidence Score ◽

False Positives ◽

Positive Interactions ◽

Relative Depth ◽

New Family ◽

Edge Based

AbstractThe large-scale protein-protein interaction (PPI) data has the potential to play a significant role in the endeavor of understanding cellular processes. However, the presence of a considerable fraction of false positives is a bottleneck in realizing this potential. There have been continuous efforts to utilize complementary resources for scoring confidence of PPIs in a manner that false positive interactions get a low confidence score. Gene Ontology (GO), a taxonomy of biological terms to represent the properties of gene products and their relations, has been widely used for this purpose. We utilize GO to introduce a new set of specificity measures: Relative Depth Specificity (RDS), Relative Node-based Specificity (RNS), and Relative Edge-based Specificity (RES), leading to a new family of similarity measures. We use these similarity measures to obtain a confidence score for each PPI. We evaluate the new measures using four different benchmarks. We show that all the three measures are quite effective. Notably, RNS and RES more effectively distinguish true PPIs from false positives than the existing alternatives. RES also shows a robust set-discriminating power and can be useful for protein functional clustering as well.

Download Full-text