Improving Semantic Similarity for Pairs of Short Biomedical Texts with Concept Definitions and Ontology Structure

Olivia SanchezGraillet

doi:10.5120/17446-8351

A Semantic Similarity Evaluation Method and a Tool Utilised in Security Applications Based on Ontology Structure and Lexicon Analysis

2017 Fourth International Conference on Mathematics and Computers in Sciences and in Industry (MCSI) ◽

10.1109/mcsi.2017.46 ◽

2017 ◽

Cited By ~ 2

Author(s):

Mariusz Chmielewski ◽

Malgorzata Paciorkowska ◽

Maciej Kiedrowicz

Keyword(s):

Semantic Similarity ◽

Evaluation Method ◽

Security Applications ◽

Ontology Structure

Download Full-text

Using Concept Definitions and Ontology Structure to Measure Semantic Similarity in Biomedicine

International Journal of Applied Information Systems ◽

10.5120/ijais14-451200 ◽

2014 ◽

Vol 7 (6) ◽

pp. 1-5 ◽

Cited By ~ 1

Author(s):

Olivia SanchezGraillet

Keyword(s):

Semantic Similarity ◽

Ontology Structure

Download Full-text

Inferring Drug-Protein–Side Effect Relationships from Biomedical Text

Genes ◽

10.3390/genes10020159 ◽

2019 ◽

Vol 10 (2) ◽

pp. 159 ◽

Cited By ~ 4

Author(s):

Min Song ◽

Seung Han Baek ◽

Go Eun Heo ◽

Jeong-Hoon Lee

Keyword(s):

Side Effects ◽

Text Mining ◽

Semantic Similarity ◽

Side Effect ◽

Relation Extraction ◽

Ranking Function ◽

Entity Recognition ◽

Free Text ◽

Pubmed Database ◽

Biomedical Texts

Background: Although there are many studies of drugs and their side effects, the underlying mechanisms of these side effects are not well understood. It is also difficult to understand the specific pathways between drugs and side effects. Objective: The present study seeks to construct putative paths between drugs and their side effects by applying text-mining techniques to free text of biomedical studies, and to develop ranking metrics that could identify the most-likely paths. Materials and Methods: We extracted three types of relationships—drug-protein, protein-protein, and protein–side effect—from biomedical texts by using text mining and predefined relation-extraction rules. Based on the extracted relationships, we constructed whole drug-protein–side effect paths. For each path, we calculated its ranking score by a new ranking function that combines corpus- and ontology-based semantic similarity as well as co-occurrence frequency. Results: We extracted 13 plausible biomedical paths connecting drugs and their side effects from cancer-related abstracts in the PubMed database. The top 20 paths were examined, and the proposed ranking function outperformed the other methods tested, including co-occurrence, COALS, and UMLS by P@5-P@20. In addition, we confirmed that the paths are novel hypotheses that are worth investigating further. Discussion: The risk of side effects has been an important issue for the US Food and Drug Administration (FDA). However, the causes and mechanisms of such side effects have not been fully elucidated. This study extends previous research on understanding drug side effects by using various techniques such as Named Entity Recognition (NER), Relation Extraction (RE), and semantic similarity. Conclusion: It is not easy to reveal the biomedical mechanisms of side effects due to a huge number of possible paths. However, we automatically generated predictable paths using the proposed approach, which could provide meaningful information to biomedical researchers to generate plausible hypotheses for the understanding of such mechanisms.

Download Full-text

English Words Connected Via Hebrew Morphology: L1-L2 Bidirectional Effects on Semantic Similarity

PsycEXTRA Dataset ◽

10.1037/e527342012-808 ◽

2007 ◽

Author(s):

Tamar Degani ◽

Anat Prior ◽

Natasha Tokowicz

Keyword(s):

Semantic Similarity ◽

Bidirectional Effects

Download Full-text

Effects of semantic similarity and associative strength on the transfer and generalization of probability learning

PsycEXTRA Dataset ◽

10.1037/e666672011-271 ◽

1969 ◽

Author(s):

Lowell Schipper ◽

Bruce L. Hanson ◽

Leonard M. Davidson

Keyword(s):

Semantic Similarity ◽

Associative Strength ◽

Probability Learning

Download Full-text

Document- and Keyword-based Author Co-citation Analysis

Data and Information Management ◽

10.2478/dim-2018-0009 ◽

2018 ◽

Vol 2 (2) ◽

pp. 70-82 ◽

Cited By ~ 2

Author(s):

Binglu Wang ◽

Yi Bu ◽

Win-bin Huang

Keyword(s):

Citation Analysis ◽

Semantic Similarity ◽

Method Validation ◽

New Method ◽

Global Network ◽

Network Visualization ◽

Knowledge Domain ◽

Knowledge Domains ◽

Domain Mapping ◽

The Relationship

AbstractIn the field of scientometrics, the principal purpose for author co-citation analysis (ACA) is to map knowledge domains by quantifying the relationship between co-cited author pairs. However, traditional ACA has been criticized since its input is insufficiently informative by simply counting authors’ co-citation frequencies. To address this issue, this paper introduces a new method that reconstructs the raw co-citation matrices by regarding document unit counts and keywords of references, named as Document- and Keyword-Based Author Co-Citation Analysis (DKACA). Based on the traditional ACA, DKACA counted co-citation pairs by document units instead of authors from the global network perspective. Moreover, by incorporating the information of keywords from cited papers, DKACA captured their semantic similarity between co-cited papers. In the method validation part, we implemented network visualization and MDS measurement to evaluate the effectiveness of DKACA. Results suggest that the proposed DKACA method not only reveals more insights that are previously unknown but also improves the performance and accuracy of knowledge domain mapping, representing a new basis for further studies.

Download Full-text

RTM-DCU: Referential Translation Machines for Semantic Similarity

10.3115/v1/s14-2085 ◽

2014 ◽

Cited By ~ 1

Author(s):

Ergun Bicici ◽

Andy Way

Keyword(s):

Semantic Similarity

Download Full-text

NTNU: Measuring Semantic Similarity with Sublexical Feature Representations and Soft Cardinality

10.3115/v1/s14-2078 ◽

2014 ◽

Cited By ~ 1

Author(s):

André Lynum ◽

Partha Pakray ◽

Björn Gambäck ◽

Sergio Jimenez

Keyword(s):

Semantic Similarity ◽

Feature Representations

Download Full-text

The Opposition of Surprisal and Semantic Similarity in the Prediction of Language Processing: Evidence from Eye-tracking Data

10.31234/osf.io/zypk9 ◽

2020 ◽

Author(s):

Kun Sun

Keyword(s):

Eye Tracking ◽

Semantic Similarity ◽

Cognitive Processing ◽

Language Processing ◽

Language Comprehension ◽

Word Processing ◽

Reading Time ◽

Computational Models ◽

Tracking Data ◽

Dynamic Approach

Expectations or predictions about upcoming content play an important role during language comprehension and processing. One important aspect of recent studies of language comprehension and processing concerns the estimation of the upcoming words in a sentence or discourse. Many studies have used eye-tracking data to explore computational and cognitive models for contextual word predictions and word processing. Eye-tracking data has previously been widely explored with a view to investigating the factors that influence word prediction. However, these studies are problematic on several levels, including the stimuli, corpora, statistical tools they applied. Although various computational models have been proposed for simulating contextual word predictions, past studies usually preferred to use a single computational model. The disadvantage of this is that it often cannot give an adequate account of cognitive processing in language comprehension. To avoid these problems, this study draws upon a massive natural and coherent discourse as stimuli in collecting the data on reading time. This study trains two state-of-art computational models (surprisal and semantic (dis)similarity from word vectors by linear discriminative learning (LDL)), measuring knowledge of both the syntagmatic and paradigmatic structure of language. We develop a `dynamic approach' to compute semantic (dis)similarity. It is the first time that these two computational models have been merged. Models are evaluated using advanced statistical methods. Meanwhile, in order to test the efficiency of our approach, one recently developed cosine method of computing semantic (dis)similarity based on word vectors data adopted is used to compare with our `dynamic' approach. The two computational and fixed-effect statistical models can be used to cross-verify the findings, thus ensuring that the result is reliable. All results support that surprisal and semantic similarity are opposed in the prediction of the reading time of words although both can make good predictions. Additionally, our `dynamic' approach performs better than the popular cosine method. The findings of this study are therefore of significance with regard to acquiring a better understanding how humans process words in a real-world context and how they make predictions in language cognition and processing.

Download Full-text

Pembobotan Berdasarkan Tingkat Kesamaan Semantik pada Metode Fuzzy Semi-Supervised Co-Clustering untuk Pengelompokkan Dokumen Teks

Jurnal ULTIMATICS ◽

10.31937/ti.v6i2.333 ◽

2014 ◽

Vol 6 (2) ◽

pp. 46-51

Author(s):

Galang Amanda Dwi P. ◽

Gregorius Edwadr ◽

Agus Zainal Arifin

Keyword(s):

Supervised Learning ◽

Semantic Similarity ◽

The Other ◽

Classification Result ◽

Document Similarity ◽

The Matrix ◽

Index Terms ◽

Membership Value ◽

Degree Of Similarity

Nowadays, a large number of information can not be reached by the reader because of the misclassification of text-based documents. The misclassified data can also make the readers obtain the wrong information. The method which is proposed by this paper is aiming to classify the documents into the correct group. Each document will have a membership value in several different classes. The method will be used to find the degree of similarity between the two documents is the semantic similarity. In fact, there is no document that doesn’t have a relationship with the other but their relationship might be close to 0. This method calculates the similarity between two documents by taking into account the level of similarity of words and their synonyms. After all inter-document similarity values obtained, a matrix will be created. The matrix is then used as a semi-supervised factor. The output of this method is the value of the membership of each document, which must be one of the greatest membership value for each document which indicates where the documents are grouped. Classification result computed by the method shows a good value which is 90 %. Index Terms - Fuzzy co-clustering, Heuristic, Semantica Similiarity, Semi-supervised learning.

Download Full-text