A new benchmark dataset with production methodology for short text semantic similarity algorithms

In the field of information science, how to help users quickly and accurately find the information they need from a tremendous amount of short texts has become an urgent problem. The recommendation model is an important way to find such information. However, existing recommendation models have some limitations in case of short text recommendation. To address these issues, this paper proposes a recommendation model based on semantic features and a knowledge graph. More specifically, we first select DBpedia as a knowledge graph to extend short text features of items and get the semantic features of the items based on the extended text. And then, we calculate the item vector and further obtain the semantic similarity degrees of the users. Finally, based on the semantic features of the items and the semantic similarity of the users, we apply the collaborative filtering technology to calculate prediction rating. A series of experiments are conducted, demonstrating the effectiveness of our model in the evaluation metrics of mean absolute error (MAE) and root mean square error (RMSE) compared with those of some recommendation algorithms. The optimal MAE for the model proposed in this paper is 0.6723, and RMSE is 0.8442. The promising results show that the recommendation effect of the model on the movie field is significantly better than those of these existing algorithms.

Download Full-text

DIBERT: Dependency Injected Bidirectional Encoder Representations from Transformers

10.36227/techrxiv.16444611.v1 ◽

2021 ◽

Author(s):

Abdul Wahab ◽

Rafet Sifa

Keyword(s):

Natural Language ◽

Sentiment Analysis ◽

Semantic Similarity ◽

Syntactic Structure ◽

Language Modeling ◽

Benchmark Dataset ◽

Fine Tuning ◽

New Model ◽

Dependency Tree ◽

Better Than

<div> <div> <div> <p> </p><div> <div> <div> <p>In this paper, we propose a new model named DIBERT which stands for Dependency Injected Bidirectional Encoder Representations from Transformers. DIBERT is a variation of the BERT and has an additional third objective called Parent Prediction (PP) apart from Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). PP injects the syntactic structure of a dependency tree while pre-training the DIBERT which generates syntax-aware generic representations. We use the WikiText-103 benchmark dataset to pre-train both BERT- Base and DIBERT. After fine-tuning, we observe that DIBERT performs better than BERT-Base on various downstream tasks including Semantic Similarity, Natural Language Inference and Sentiment Analysis. </p> </div> </div> </div> </div> </div> </div>

Download Full-text

DIBERT: Dependency Injected Bidirectional Encoder Representations from Transformers

10.36227/techrxiv.16444611.v2 ◽

2021 ◽

Author(s):

Abdul Wahab ◽

Rafet Sifa

Keyword(s):

Natural Language ◽

Sentiment Analysis ◽

Semantic Similarity ◽

Syntactic Structure ◽

Language Modeling ◽

Benchmark Dataset ◽

Fine Tuning ◽

New Model ◽

Dependency Tree ◽

Better Than

<div> <div> <div> <p> </p><div> <div> <div> <p>In this paper, we propose a new model named DIBERT which stands for Dependency Injected Bidirectional Encoder Representations from Transformers. DIBERT is a variation of the BERT and has an additional third objective called Parent Prediction (PP) apart from Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). PP injects the syntactic structure of a dependency tree while pre-training the DIBERT which generates syntax-aware generic representations. We use the WikiText-103 benchmark dataset to pre-train both BERT- Base and DIBERT. After fine-tuning, we observe that DIBERT performs better than BERT-Base on various downstream tasks including Semantic Similarity, Natural Language Inference and Sentiment Analysis. </p> </div> </div> </div> </div> </div> </div>

Download Full-text

Exploiting Global Semantic Similarity Biterms for Short-Text Topic Discovery

2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI) ◽

10.1109/ictai.2018.00151 ◽

2018 ◽

Author(s):

Heng-Yang Lu ◽

Gao-Jian Ge ◽

Yun Li ◽

Chong-Jun Wang ◽

Jun-Yuan Xie

Keyword(s):

Semantic Similarity ◽

Short Text ◽

Topic Discovery

Download Full-text

A survey on the techniques, applications, and performance of short text semantic similarity

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.5971 ◽

2020 ◽

Author(s):

Mengting Han ◽

Xuan Zhang ◽

Xin Yuan ◽

Jiahao Jiang ◽

Wei Yun ◽

...

Keyword(s):

Semantic Similarity ◽

Short Text ◽

And Performance

Download Full-text

Heuristic Inventive Design Problem Solving Based on Semantic Relatedness

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.875-877.968 ◽

2014 ◽

Vol 875-877 ◽

pp. 968-972

Author(s):

Wei Yan ◽

Cecilia Zanni-Merk ◽

François Rousselot ◽

Denis Cavallucci ◽

Pierre Collet

Keyword(s):

Problem Solving ◽

Semantic Similarity ◽

Design Problem ◽

Semantic Relatedness ◽

Level Of Detail ◽

Knowledge Sources ◽

Design Problems ◽

Short Text ◽

Design Expert

A growing number of industries feel the need of formalizing their innovation approaches. Modern innovation theories and methods use different knowledge sources for solving inventive design problems. These sources are generally about similar notions, but the level of detail of their description can be very different. We are interested in finding semantic links among these sources and developing an intelligent way of managing this knowledge, with the goal of assisting the inventive design expert during his activities. This paper explores a short text semantic similarity approach to search potential links among these sources. These links available could facilitate the retrieval for the heuristic solutions of inventive problems for TRIZ users.

Download Full-text

An overview of textual semantic similarity measures based on web intelligence

10.31219/osf.io/ka26r ◽

2017 ◽

Author(s):

Jorge Martinez-Gil

Keyword(s):

Semantic Similarity ◽

Similarity Measures ◽

Brand Names ◽

Frequent Pattern ◽

Web Intelligence ◽

Short Text ◽

Proper Nouns ◽

Semantic Similarity Measurement ◽

Traditional Approaches ◽

Degree Of Similarity

Computing the semantic similarity between terms (or short text expressions) that have the same meaning but which are not lexicographically similar is a key challenge in many computer related fields. The problem is that traditional approaches to semantic similarity measurement are not suitable for all situations, for example, many of them often fail to deal with terms not covered by synonym dictionaries or are not able to cope with acronyms, abbreviations, buzzwords, brand names, proper nouns, and so on. In this paper, we present and evaluate a collection of emerging techniques developed to avoid this problem. These techniques use some kinds of web intelligence to determine the degree of similarity between text expressions. These techniques implement a variety of paradigms including the study of co-occurrence, text snippet comparison, frequent pattern finding, or search log analysis. The goal is to substitute the traditional techniques where necessary.

Download Full-text

A Review on WordNet and Vector Space Analysis for Short-text Semantic Similarity

International Journal of Innovations in Engineering and Technology ◽

10.21172/ijiet.81.018 ◽

2017 ◽

Vol 8 (1) ◽

Cited By ~ 1

Keyword(s):

Vector Space ◽

Semantic Similarity ◽

Short Text ◽

Space Analysis

Download Full-text