text similarity measures Latest Research Papers

Despite the ever-increasing interest in the field of text similarity methods, the development of adequate text similarity methods is lagging. Some methods are decent in entailment while others are reasonable to the degree to which two texts are similar. Very often, these methods are compared using Pearson’s correlation; however, Pearson’s correlation is bound to outliers that could affect the final correlation coefficient figure. As a result, the Pearson correlation is inadequate to find which text similarity method is better in situations where data items are very similar or are unrelated. This paper borrows the scaled Pearson correlation from the finance domain and builds a metric that can evaluate the performance of similarity methods over cross-sectional datasets. Results showed that the new metric is fine-grained with the benchmark dataset scores range as a promising alternative to Pearson’s correlation. Moreover, extrinsic results from the application of the System Usability Scale (SUS) questionnaire on the scaled Pearson correlation revealed that the proposed metric is attaining attention from scholars which implicate its usage in the academia.

Download Full-text

The N-Grams Based Text Similarity Detection Approach Using Self-Organizing Maps and Similarity Measures

Applied Sciences ◽

10.3390/app9091870 ◽

2019 ◽

Vol 9 (9) ◽

pp. 1870 ◽

Cited By ~ 3

Author(s):

Pavel Stefanovič ◽

Olga Kurasova ◽

Rokas Štrimaitis

Keyword(s):

Data Clustering ◽

Similarity Measures ◽

Self Organizing Map ◽

Text Similarity ◽

Self Organizing Maps ◽

Similarity Detection ◽

Word Level ◽

Detection Approach ◽

Text Similarity Measures ◽

Self Organizing

In the paper the word-level n-grams based approach is proposed to find similarity between texts. The approach is a combination of two separate and independent techniques: self-organizing map (SOM) and text similarity measures. SOM’s uniqueness is that the obtained results of data clustering, as well as dimensionality reduction, are presented in a visual form. The four measures have been evaluated: cosine, dice, extended Jaccard’s, and overlap. First of all, texts have to be converted to numerical expression. For that purpose, the text has been split into the word-level n-grams and after that, the bag of n-grams has been created. The n-grams’ frequencies are calculated and the frequency matrix of dataset is formed. Various filters are used to create a bag of n-grams: stemming algorithms, number and punctuation removers, stop words, etc. All experimental investigation has been made using a corpus of plagiarized short answers dataset.

Download Full-text

A Heuristic Based Pre-processing Methodology for Short Text Similarity Measures in Microblogs

2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) ◽

10.1109/hpcc/smartcity/dss.2018.00265 ◽

2018 ◽

Author(s):

Noufa Alnajran ◽

Keeley Crockett ◽

David McLean ◽

Annabel Latham

Keyword(s):

Similarity Measures ◽

Text Similarity ◽

Short Text ◽

Short Text Similarity ◽

Text Similarity Measures

Download Full-text

A comparative analysis of text similarity measures and algorithms in research paper recommender systems

2018 Conference on Information Communications Technology and Society (ICTAS) ◽

10.1109/ictas.2018.8368766 ◽

2018 ◽

Cited By ~ 5

Author(s):

Maake Benard Magara ◽

Sunday O. Ojo ◽

Tranos Zuva

Keyword(s):

Comparative Analysis ◽

Recommender Systems ◽

Similarity Measures ◽

Research Paper ◽

Text Similarity ◽

Text Similarity Measures ◽

Research Paper Recommender Systems

Download Full-text

A Comparative Study for String Metrics and the Feasibility of Joining them as Combined Text Similarity Measures

ARO-The Scientific Journal of Koya University ◽

10.14500/aro.10180 ◽

2017 ◽

Vol 5 (2) ◽

pp. 6-18 ◽

Cited By ~ 1

Author(s):

Safa Abdul-Jabbar ◽

◽

Loay George ◽

Keyword(s):

Comparative Study ◽

Similarity Measures ◽

Text Similarity ◽

Text Similarity Measures

Download Full-text

Efficient Set Intersection Counting Algorithm for Text Similarity Measures

2017 Proceedings of the Ninteenth Workshop on Algorithm Engineering and Experiments (ALENEX) ◽

10.1137/1.9781611974768.12 ◽

2017 ◽

Cited By ~ 1

Author(s):

Preethi Lahoti ◽

Patrick K. Nicholson ◽

Bilyana Taneva

Keyword(s):

Similarity Measures ◽

Efficient Set ◽

Text Similarity ◽

Set Intersection ◽

Text Similarity Measures ◽

Counting Algorithm

Download Full-text

Reference scope identification for citances by classification with text similarity measures

Proceedings of the 6th International Conference on Software and Computer Applications - ICSCA '17 ◽

10.1145/3056662.3056692 ◽

2017 ◽

Cited By ~ 2

Author(s):

Jen-Yuan Yeh ◽

Tien-Yu Hsu ◽

Cheng-Jung Tsai ◽

Pei-Cheng Cheng

Keyword(s):

Similarity Measures ◽

Text Similarity ◽

Text Similarity Measures

Download Full-text

E-Government Documents and Data Clustering

Advances in Electronic Government, Digital Divide, and Regional Development - Handbook of Research on Democratic Strategies and Citizen-Centered E-Government Services ◽

10.4018/978-1-4666-7266-6.ch010 ◽

2015 ◽

pp. 164-191

Author(s):

Goran Šimić

Keyword(s):

Information Retrieval ◽

Data Clustering ◽

Similarity Measures ◽

Information Resources ◽

Specific Information ◽

Advanced Search ◽

Government Documents ◽

Textual Data ◽

The Cost ◽

Text Similarity Measures

This chapter is about documents and data clustering as a process of preparing the information resources stored in the e-government systems for advanced search. These resources are mainly represented as textual data stored as field values in the databases or located as documents in file repositories. Due to their growth in number, search for some specific information takes more time. Different techniques are used for this purpose. Most of them include information retrieval based on a variety of text similarity measures. The cost of such processing depends on preparation of resources for searching. Clustering represents the most commonly used technique for such a purpose, and this fact is the basic motive for this chapter.

Download Full-text

text similarity measures
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Text Similarity Measures in News Articles by Vector Space Model Using NLP

A Study on Text Similarity Measures

Scaled Pearson’s Correlation Coefficient for Evaluating Text Similarity Measures

The N-Grams Based Text Similarity Detection Approach Using Self-Organizing Maps and Similarity Measures

A Heuristic Based Pre-processing Methodology for Short Text Similarity Measures in Microblogs

A comparative analysis of text similarity measures and algorithms in research paper recommender systems

A Comparative Study for String Metrics and the Feasibility of Joining them as Combined Text Similarity Measures

Efficient Set Intersection Counting Algorithm for Text Similarity Measures

Reference scope identification for citances by classification with text similarity measures

E-Government Documents and Data Clustering

Export Citation Format

text similarity measuresRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Text Similarity Measures in News Articles by Vector Space Model Using NLP

A Study on Text Similarity Measures

Scaled Pearson’s Correlation Coefficient for Evaluating Text Similarity Measures

The N-Grams Based Text Similarity Detection Approach Using Self-Organizing Maps and Similarity Measures

A Heuristic Based Pre-processing Methodology for Short Text Similarity Measures in Microblogs

A comparative analysis of text similarity measures and algorithms in research paper recommender systems

A Comparative Study for String Metrics and the Feasibility of Joining them as Combined Text Similarity Measures

Efficient Set Intersection Counting Algorithm for Text Similarity Measures

Reference scope identification for citances by classification with text similarity measures

E-Government Documents and Data Clustering

text similarity measures
Recently Published Documents