scholarly journals Measuring the importance of annotation granularity to the detection of semantic similarity between phenotype profiles

2016 ◽  
Author(s):  
Prashanti Manda ◽  
James P Balhoff ◽  
Todd J Vision

In phenotype annotations curated from the biological and medical literature, considerable human effort must be invested to select ontological classes that capture the expressivity of the original natural language descriptions, and finer annotation granularity can also entail higher computational costs for particular reasoning tasks. Do coarse annotations suffice for certain applications? Here, we measure how annotation granularity affects the statistical behavior of semantic similarity metrics. We use a randomized dataset of phenotype profiles drawn from 57,051 taxon-phenotype annotations in the Phenoscape Knowledgebase. We compared query profiles having variable proportions of matching phenotypes to subject database profiles using both pairwise and groupwise Jaccard (edge-based) and Resnik (node-based) semantic similarity metrics, and compared statistical performance for three different levels of annotation granularity: entities alone, entities plus attributes, and entities plus qualities (with implicit attributes). All four metrics examined showed more extreme values than expected by chance when approximately half the annotations matched between the query and subject profiles, with a more sudden decline for pairwise statistics and a more gradual one for the groupwise statistics. Annotation granularity had a negligible effect on the position of the threshold at which matches could be discriminated from noise. These results suggest that coarse annotations of phenotypes, at the level of entities with or without attributes, may be sufficient to identify phenotype profiles with statistically significant semantic similarity.

2018 ◽  
Author(s):  
Prashanti Manda ◽  
Todd Vision

1AbstractSemantic similarity has been used for comparing genes, proteins, phenotypes, diseases, etc. for various biological applications. The rise of ontology-based data representation in biology has also led to the development of several semantic similarity metrics that use different statistics to estimate similarity.Although semantic similarity has become a crucial computational tool in several applications, there has not been a formal evaluation of the statistical sensitivity of these metrics and their ability to recognize similarity between distantly related biological objects.Here, we present a statistical sensitivity comparison of five semantic similarity metrics (Jaccard, Resnik, Lin, Jiang& Conrath, and Hybrid Relative Specificity Similarity) representing three different kinds of metrics (Edge based, Node based, and Hybrid) and explore key parameter choices that can impact sensitivity. Furthermore, we compare four methods of aggregating individual annotation similarities to estimate similarity between two biological objects - All Pairs, Best Pairs, Best Pairs Symmetric, and Groupwise.To evaluate sensitivity in a controlled fashion, we explore two different models for simulating data with varying levels of similarity and compare to the noise distribution using resampling. Source data are derived from the Phenoscape Knowledgebase of evolutionary phenotypes.Our results indicate that the choice of similarity metric along with different parameter choices can substantially affect sensitivity. Among the five metrics evaluated, we find that Resnik similarity shows the greatest sensitivity to weak semantic similarity. Among the ways to combine pairwise statistics, the Groupwise approach provides the greatest discrimination among values above the sensitivity threshold, while the Best Pairs statistic can be parametrically tuned to provide the highest sensitivity.Our findings serve as a guideline for an appropriate choice and parameterization of semantic similarity metrics, and point to the need for improved reporting of the statistical significance of semantic similarity matches in cases where weak similarity is of interest


AERA Open ◽  
2021 ◽  
Vol 7 ◽  
pp. 233285842110286
Author(s):  
Kylie L. Anglin ◽  
Vivian C. Wong ◽  
Arielle Boguslav

Though there is widespread recognition of the importance of implementation research, evaluators often face intense logistical, budgetary, and methodological challenges in their efforts to assess intervention implementation in the field. This article proposes a set of natural language processing techniques called semantic similarity as an innovative and scalable method of measuring implementation constructs. Semantic similarity methods are an automated approach to quantifying the similarity between texts. By applying semantic similarity to transcripts of intervention sessions, researchers can use the method to determine whether an intervention was delivered with adherence to a structured protocol, and the extent to which an intervention was replicated with consistency across sessions, sites, and studies. This article provides an overview of semantic similarity methods, describes their application within the context of educational evaluations, and provides a proof of concept using an experimental study of the impact of a standardized teacher coaching intervention.


2021 ◽  
Vol 54 (2) ◽  
pp. 1-37
Author(s):  
Dhivya Chandrasekaran ◽  
Vijay Mago

Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods for determining semantic similarity measures. To address this issue, various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods beginning from traditional NLP techniques such as kernel-based methods to the most recent research work on transformer-based models, categorizing them based on their underlying principles as knowledge-based, corpus-based, deep neural network–based methods, and hybrid methods. Discussing the strengths and weaknesses of each method, this survey provides a comprehensive view of existing systems in place for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity.


Author(s):  
Saravanakumar Kandasamy ◽  
Aswani Kumar Cherukuri

Semantic similarity quantification between concepts is one of the inevitable parts in domains like Natural Language Processing, Information Retrieval, Question Answering, etc. to understand the text and their relationships better. Last few decades, many measures have been proposed by incorporating various corpus-based and knowledge-based resources. WordNet and Wikipedia are two of the Knowledge-based resources. The contribution of WordNet in the above said domain is enormous due to its richness in defining a word and all of its relationship with others. In this paper, we proposed an approach to quantify the similarity between concepts that exploits the synsets and the gloss definitions of different concepts using WordNet. Our method considers the gloss definitions, contextual words that are helping in defining a word, synsets of contextual word and the confidence of occurrence of a word in other word’s definition for calculating the similarity. The evaluation based on different gold standard benchmark datasets shows the efficiency of our system in comparison with other existing taxonomical and definitional measures.


2020 ◽  
pp. 016555152093438
Author(s):  
Jose L. Martinez-Rodriguez ◽  
Ivan Lopez-Arevalo ◽  
Ana B. Rios-Alvarado

The Semantic Web provides guidelines for the representation of information about real-world objects (entities) and their relations (properties). This is helpful for the dissemination and consumption of information by people and applications. However, the information is mainly contained within natural language sentences, which do not have a structure or linguistic descriptions ready to be directly processed by computers. Thus, the challenge is to identify and extract the elements of information that can be represented. Hence, this article presents a strategy to extract information from sentences and its representation with Semantic Web standards. Our strategy involves Information Extraction tasks and a hybrid semantic similarity measure to get entities and relations that are later associated with individuals and properties from a Knowledge Base to create RDF triples (Subject–Predicate–Object structures). The experiments demonstrate the feasibility of our method and that it outperforms the accuracy provided by a pattern-based method from the literature.


2007 ◽  
Vol 15 (3) ◽  
pp. 199-213 ◽  
Author(s):  
Arthur C. Graesser ◽  
Moongee Jeon ◽  
Yan Yan ◽  
Zhiqiang Cai

Discourse cohesion is presumably an important facilitator of comprehension when individuals read texts and hold conversations. This study investigated components of cohesion and language in different types of discourse about Newtonian physics: A textbook, textoids written by experimental psychologists, naturalistic tutorial dialoguebetween expert human tutors and college students, andAutoTutor tutorial dialogue between a computer tutor and students (AutoTutor is an animated pedagogical agent that helps students learn about physics by holding conversations in natural language). We analyzed the four types of discourse with Coh-Metrix, a software tool that measures discourse on different components of cohesion, language, and readability. The cohesion indices included co-reference, syntactic and semantic similarity, causal cohesion, incidence of cohesion signals (e.g., connectives, logical operators), and many other measures. Cohesion data were quite similar for the two forms of discourse in expository monologue (textbooks and textoids) and for the two types of tutorial dialogue (i.e., students interacting with human tutors and AutoTutor), but very different between the discourse of expository monologue and tutorial dialogue. Coh-Metrix was also able to detect subtle differences in the language and discourse of AutoTutor versus human tutoring.


Author(s):  
Dongxing Cao ◽  
Karthik Ramani ◽  
Ming Wang Fu ◽  
Runli Zhang

The modularity indicates a one-to-one mapping between functional concepts and physical components. It can allow us to generate more product varieties at lower costs. Functional concepts can be described by precise syntactic structures with functional terms. Different semantic measures can be used to evaluate the strength of the semantic link between two functional concepts from port ontology. In this paper, different methods of modularity based on ontology are first investigated. Secondly, the primitive concepts are presented based on port ontology by using natural language, and then their semantic synthesis is used to describe component ontology. The taxonomy of port-based ontology are built to map the component connections and interactions in order to build functional blocks. Next, propose an approach to computing semantic similarity by mapping terms to functional ontology and by examining their relationships based on port ontology language. Furthermore, several modules are partitioned on the basis of similarity measures. The process of module construction is described and its elements are related to the similarity values between concepts. Finally, a case is studied to show the efficiency of port ontology semantic similarity for modular concept generation.


Sign in / Sign up

Export Citation Format

Share Document