scholarly journals A new benchmark dataset with production methodology for short text semantic similarity algorithms

2013 ◽  
Vol 10 (4) ◽  
pp. 1-63 ◽  
Author(s):  
James O'shea ◽  
Zuhair Bandar ◽  
Keeley Crockett
2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Yudong Liu ◽  
Wen Chen

In the field of information science, how to help users quickly and accurately find the information they need from a tremendous amount of short texts has become an urgent problem. The recommendation model is an important way to find such information. However, existing recommendation models have some limitations in case of short text recommendation. To address these issues, this paper proposes a recommendation model based on semantic features and a knowledge graph. More specifically, we first select DBpedia as a knowledge graph to extend short text features of items and get the semantic features of the items based on the extended text. And then, we calculate the item vector and further obtain the semantic similarity degrees of the users. Finally, based on the semantic features of the items and the semantic similarity of the users, we apply the collaborative filtering technology to calculate prediction rating. A series of experiments are conducted, demonstrating the effectiveness of our model in the evaluation metrics of mean absolute error (MAE) and root mean square error (RMSE) compared with those of some recommendation algorithms. The optimal MAE for the model proposed in this paper is 0.6723, and RMSE is 0.8442. The promising results show that the recommendation effect of the model on the movie field is significantly better than those of these existing algorithms.


2021 ◽  
Author(s):  
Abdul Wahab ◽  
Rafet Sifa

<div> <div> <div> <p> </p><div> <div> <div> <p>In this paper, we propose a new model named DIBERT which stands for Dependency Injected Bidirectional Encoder Representations from Transformers. DIBERT is a variation of the BERT and has an additional third objective called Parent Prediction (PP) apart from Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). PP injects the syntactic structure of a dependency tree while pre-training the DIBERT which generates syntax-aware generic representations. We use the WikiText-103 benchmark dataset to pre-train both BERT- Base and DIBERT. After fine-tuning, we observe that DIBERT performs better than BERT-Base on various downstream tasks including Semantic Similarity, Natural Language Inference and Sentiment Analysis. </p> </div> </div> </div> </div> </div> </div>


2021 ◽  
Author(s):  
Abdul Wahab ◽  
Rafet Sifa

<div> <div> <div> <p> </p><div> <div> <div> <p>In this paper, we propose a new model named DIBERT which stands for Dependency Injected Bidirectional Encoder Representations from Transformers. DIBERT is a variation of the BERT and has an additional third objective called Parent Prediction (PP) apart from Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). PP injects the syntactic structure of a dependency tree while pre-training the DIBERT which generates syntax-aware generic representations. We use the WikiText-103 benchmark dataset to pre-train both BERT- Base and DIBERT. After fine-tuning, we observe that DIBERT performs better than BERT-Base on various downstream tasks including Semantic Similarity, Natural Language Inference and Sentiment Analysis. </p> </div> </div> </div> </div> </div> </div>


Author(s):  
Mengting Han ◽  
Xuan Zhang ◽  
Xin Yuan ◽  
Jiahao Jiang ◽  
Wei Yun ◽  
...  

2014 ◽  
Vol 875-877 ◽  
pp. 968-972
Author(s):  
Wei Yan ◽  
Cecilia Zanni-Merk ◽  
François Rousselot ◽  
Denis Cavallucci ◽  
Pierre Collet

A growing number of industries feel the need of formalizing their innovation approaches. Modern innovation theories and methods use different knowledge sources for solving inventive design problems. These sources are generally about similar notions, but the level of detail of their description can be very different. We are interested in finding semantic links among these sources and developing an intelligent way of managing this knowledge, with the goal of assisting the inventive design expert during his activities. This paper explores a short text semantic similarity approach to search potential links among these sources. These links available could facilitate the retrieval for the heuristic solutions of inventive problems for TRIZ users.


2017 ◽  
Author(s):  
Jorge Martinez-Gil

Computing the semantic similarity between terms (or short text expressions) that have the same meaning but which are not lexicographically similar is a key challenge in many computer related fields. The problem is that traditional approaches to semantic similarity measurement are not suitable for all situations, for example, many of them often fail to deal with terms not covered by synonym dictionaries or are not able to cope with acronyms, abbreviations, buzzwords, brand names, proper nouns, and so on. In this paper, we present and evaluate a collection of emerging techniques developed to avoid this problem. These techniques use some kinds of web intelligence to determine the degree of similarity between text expressions. These techniques implement a variety of paradigms including the study of co-occurrence, text snippet comparison, frequent pattern finding, or search log analysis. The goal is to substitute the traditional techniques where necessary.


Sign in / Sign up

Export Citation Format

Share Document